Once you have created the jar with ant, you should be able to run
the word count demo is standalone mode. Run the class to find out its
command-line arguments:
$ hadoop jar cloud9.jar edu.umd.cloud9.example.simple.DemoWordCount
usage: [input-path] [output-path] [num-reducers]
Now run the code with on the sample text collection:
$ hadoop jar cloud9.jar edu.umd.cloud9.example.simple.DemoWordCount data/bible+shakes.nopunc wc 1
10/07/11 22:25:42 INFO simple.DemoWordCount: Tool: DemoWordCount
10/07/11 22:25:42 INFO simple.DemoWordCount: - input path: data/bible+shakes.nopunc
10/07/11 22:25:42 INFO simple.DemoWordCount: - output path: wc
10/07/11 22:25:42 INFO simple.DemoWordCount: - number of reducers: 1
[...]
10/07/11 22:25:48 INFO mapred.JobClient: Counters: 12
10/07/11 22:25:48 INFO mapred.JobClient: FileSystemCounters
10/07/11 22:25:48 INFO mapred.JobClient: FILE_BYTES_READ=22907000
10/07/11 22:25:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=5867160
10/07/11 22:25:48 INFO mapred.JobClient: Map-Reduce Framework
10/07/11 22:25:48 INFO mapred.JobClient: Reduce input groups=41788
10/07/11 22:25:48 INFO mapred.JobClient: Combine output records=128253
10/07/11 22:25:48 INFO mapred.JobClient: Map input records=156215
10/07/11 22:25:48 INFO mapred.JobClient: Reduce shuffle bytes=0
10/07/11 22:25:48 INFO mapred.JobClient: Reduce output records=41788
10/07/11 22:25:48 INFO mapred.JobClient: Spilled Records=170041
10/07/11 22:25:48 INFO mapred.JobClient: Map output bytes=15919397
10/07/11 22:25:48 INFO mapred.JobClient: Combine input records=1820763
10/07/11 22:25:48 INFO mapred.JobClient: Map output records=1734298
10/07/11 22:25:48 INFO mapred.JobClient: Reduce input records=41788
10/07/11 22:25:48 INFO simple.DemoWordCount: Job Finished in 5.345 seconds
There should now be a new sub-directory in your current directory
called wc/
that contains the output of the word
count demo:
$ head wc/part-r-00000
&c 70
&c' 1
''all 1
''among 1
''and 1
''but 1
''how 1
''lo 2
''look 1
''my 1
$ tail wc/part-r-00000
zorites 1
zorobabel 3
zounds 20
zuar 5
zuph 3
zur 5
zuriel 1
zurishaddai 5
zuzims 1
zwaggered 1
$ wc wc/part-r-00000
41788 83576 447180 wc/part-r-00000
And that's it! Now you're ready to run a real MapReduce
cluster.