Map Reduce



In Hadoop, MapReduce is a calculation that decomposes large manipulation jobs into individual tasks that can be executed in parallel cross a cluster of servers. The results of tasks can be joined together to compute final results.


MapReduce consists of 2 steps:

Map Function – It catch a set of data and converts it into another set of data, where individual elements are broken down into tuples (Key-Value pair).

Reduce Function – Takes the output from Map as an input and combines those data tuples into a smaller set of tuples.



The major advantage of MapReduce is that it is easy to balance data processing over multiple computing nodes. Under the MapReduce model, the data processing primitives are called mappers and reducers. Decomposing a data processing application into mappers and reducers is sometimes non petty. But, once we write an application in the MapReduce form, scaling the application to run more than hundreds, thousands, or even tens of thousands of machines in a cluster is simply a configuration change. This simple scalability is what has attracted many programmers to use the MapReduce model.

Lets see some example :











More from hadoop  tutorial

Top 10 file system program







Learn more from hadoop tutorial