Saturday, 18 November 2017

Program to count the Words in the File Hadoop Map Reduce

The main that shows how to specify that you want to run this class and its mapper and reducer within the hadoop framework, via an hadoop class called the ToolRunner.
The run function demonstration how you set up parameters for the job. This is a typical minimum set of job parameters. You may find that you need others, such as requsting how many map tasks and redcue tasks to use. See the hadoop documentation for the Job class for more information.
The inner class called Map. It extends the Mapper class defined in the hadoop API. Within the < and > brackets are listed the data types for the input key and value and the produced key and value. You override a function called map that defines the work of the mapper function.
The inner class called Reduce. It extends the Reducer class defined in the hadoop API. Within the < and > brackets are listed the data types for the input key and value and the emitted key and value. You override a function called reduce that defines the work of the reducer function.

Wc.java


import java.io.IOException;
import java.util.*;

import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.util.*;

public class wc extends Configured implements Tool {

  public static void main(String args[]) throws Exception {
    int res = ToolRunner.run(new wc(), args);
    System.exit(res);
  }

  public int run(String[] args) throws Exception {
    Path inputPath = new Path(args[0]);
    Path outputPath = new Path(args[1]);

    Configuration conf = getConf();
    Job job = new Job(conf, this.getClass().toString());

    FileInputFormat.setInputPaths(job, inputPath);
    FileOutputFormat.setOutputPath(job, outputPath);

    job.setJobName("wc");
    job.setJarByClass(wc.class);
    job.setJar("wc.jar"); 
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setMapperClass(Map.class);
    job.setCombinerClass(Reduce.class);
    job.setReducerClass(Reduce.class);

    return job.waitForCompletion(true) ? 0 : 1;
  }

  public static class Map extends Mapper
 {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    
    public void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException
    {
      String line = value.toString();
      StringTokenizer tokenizer = new StringTokenizer(line);
      while (tokenizer.hasMoreTokens()) 
 {
        word.set(tokenizer.nextToken());
        context.write(word, one);
       }
    }
  }

  public static class Reduce extends Reducer {

    
    public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException
     {
      int sum = 0;
      for (IntWritable value : values) {
        sum += value.get();
      }

      context.write(key, new IntWritable(sum));
    }
  }

}



Output :

hadoop fs -ls /Output1
17/08/19 16:21:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 administrator supergroup          0 2017-08-19 16:19 /Output1/_SUCCESS
-rw-r--r--   1 administrator supergroup         70 2017-08-19 16:19 /Output1/part-r-00000
administrator@ravi:/usr/local/hadoop/bin$ hadoop fs -cat /Output1/part-r-00000
17/08/19 16:22:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
:               3
comp     1
it             1
NAME   2
NO         1
ROLL      1
SEM-3   1
sharma 1
STD        1
Shweta 1

1 comments:

  1. Informative post about hadoop, i am looking forward for realtime hadoop online training institute.

    ReplyDelete