SQL Server, Analytics, .Net, Machine Learning, R, Python
Mitch Wheat has been working as a professional programmer since 1984, graduating with a honours degree in Mathematics from Warwick University, UK in 1986. He moved to Perth in 1995, having worked in software houses in London and Rotterdam. He has worked in the areas of mining, electronics, research, defence, financial, GIS, telecommunications, engineering, and information management. Mitch has worked mainly with Microsoft technologies (since Windows version 3.0) but has also used UNIX. He holds the following Microsoft certifications: MCPD (Web and Windows) using C# and SQL Server MCITP (Admin and Developer). His preferred development environment is C#, .Net Framework and SQL Server. Mitch has worked as an independent consultant for the last 10 years, and is currently involved with helping teams improve their Software Development Life Cycle. His areas of special interest lie in performance tuning
Monday, August 07, 2006
I’ve been familiar with the concept of MapReduce for some time (a less generic form of MapReduce formed the basis of a paper I wrote circa 1988, ‘Sorting with near-linear speedup on tightly-coupled multi-processors”), although I’ve never used a functional programming language in anger. I’ve just finished reading an excellent research paper by Jeffrey Dean and Sanjay Ghemawat (both at Google) titled “Simplified Data Processing on Large Clusters”. MapReduce is a programming paradigm and an associated implementation for processing large datasets. The key point being that “Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines.”
This is a very accessible paper regardless of your computing background. Well worth reading, if only to get a glimpse of how the Google distributed indexing engine performs its work.
MSN, Email: mitch døt wheat at gmail.com