Counters are lightweight objects in Hadoop that allow you to keep track of system progress in both the map and reduce stages of processing. By default, Hadoop defines a number of standard counters in "groups"; these show up in the jobtracker webapp, giving you information such as "Map input records", "Map output records", etc. This short guide shows you how to programmatically manipulate counters, and is up to date as of Hadoop 0.20.1.
Working with counters
Built-In Counters
Here's some sample code that iterates through all available counter groups and prints out names of counters in each group:
RunningJob job = JobClient.runJob(conf); Counters counters = job.getCounters(); for (Counters.Group group : counters) { System.out.println("- Counter Group: " + group.getDisplayName() + " (" + group.getName() + ")"); System.out.println(" number of counters in this group: " + group.size()); for (Counters.Counter counter : group) { System.out.println(" - " + counter.getDisplayName() + ": " + counter.getName()); } }
In Hadoop 0.20.1, It is a bit awkward to access counter values directly. The easiest way to look up a counter is by an enum, but unfortunately the enums that correspond to the built-in counters are not publicly accessible. See JIRA HADOOP-4043 and JIRA HADOOP-5717 for more information about this. This issue has been fixed, and is slated for inclusion in Hadoop 0.21.0
In the meantime, the current working solution is to look up counters by their string names, which is more verbose. Here are the group names and names of counters within those individual groups:
Group Name | Counter Name |
org.apache.hadoop.mapred.Task$Counter | MAP_INPUT_RECORDS |
org.apache.hadoop.mapred.Task$Counter | MAP_OUTPUT_RECORDS |
org.apache.hadoop.mapred.Task$Counter | MAP_SKIPPED_RECORDS |
org.apache.hadoop.mapred.Task$Counter | MAP_INPUT_BYTES |
org.apache.hadoop.mapred.Task$Counter | MAP_OUTPUT_BYTES |
org.apache.hadoop.mapred.Task$Counter | COMBINE_INPUT_RECORDS |
org.apache.hadoop.mapred.Task$Counter | COMBINE_OUTPUT_RECORDS |
org.apache.hadoop.mapred.Task$Counter | REDUCE_INPUT_GROUPS |
org.apache.hadoop.mapred.Task$Counter | REDUCE_SHUFFLE_BYTES |
org.apache.hadoop.mapred.Task$Counter | REDUCE_INPUT_RECORDS |
org.apache.hadoop.mapred.Task$Counter | REDUCE_OUTPUT_RECORDS |
org.apache.hadoop.mapred.Task$Counter | REDUCE_SKIPPED_GROUPS |
org.apache.hadoop.mapred.Task$Counter | REDUCE_SKIPPED_RECORDS |
org.apache.hadoop.mapred.JobInProgress$Counter | TOTAL_LAUNCHED_MAPS |
org.apache.hadoop.mapred.JobInProgress$Counter | RACK_LOCAL_MAPS |
org.apache.hadoop.mapred.JobInProgress$Counter | DATA_LOCAL_MAPS |
org.apache.hadoop.mapred.JobInProgress$Counter | TOTAL_LAUNCHED_REDUCES |
FileSystemCounters | FILE_BYTES_READ |
FileSystemCounters | HDFS_BYTES_READ |
FileSystemCounters | FILE_BYTES_WRITTEN |
FileSystemCounters | HDFS_BYTES_WRITTEN |
And here is some boilerplate code to access individual counters:
Counters.Counter counter = null; String[] taskCounterNames = { "MAP_INPUT_RECORDS", "MAP_OUTPUT_RECORDS", "MAP_SKIPPED_RECORDS", "MAP_INPUT_BYTES", "MAP_OUTPUT_BYTES", "COMBINE_INPUT_RECORDS", "COMBINE_OUTPUT_RECORDS", "REDUCE_INPUT_GROUPS", "REDUCE_SHUFFLE_BYTES", "REDUCE_INPUT_RECORDS", "REDUCE_OUTPUT_RECORDS", "REDUCE_SKIPPED_GROUPS", "REDUCE_SKIPPED_RECORDS", "SPILLED_RECORDS" }; for (String c : taskCounterNames) { counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", c); System.out.println(counter.getDisplayName() + ": " + counter.getCounter()); } String[] jobCounterNames = { "TOTAL_LAUNCHED_MAPS", "RACK_LOCAL_MAPS", "DATA_LOCAL_MAPS", "TOTAL_LAUNCHED_REDUCES" }; for (String c : jobCounterNames) { counter = counters.findCounter("org.apache.hadoop.mapred.JobInProgress$Counter", c); System.out.println(counter.getDisplayName() + ": " + counter.getCounter()); } String[] fsCounterNames = { "FILE_BYTES_READ", "HDFS_BYTES_READ", "FILE_BYTES_WRITTEN", "HDFS_BYTES_WRITTEN" }; for (String c : fsCounterNames) { counter = counters.findCounter("FileSystemCounters", c); System.out.println(counter.getDisplayName() + ": " + counter.getCounter()); }
Custom Counters
In addition to the built-in counters associated with every Hadoop job, you can create your own counters. First, define an enum that contains a list of the counters you'd like to have:
protected static enum MyCounter { INPUT_WORDS };
Then, in the mapper or reducer, all you have to do is update the counter via the reporter object:
reporter.incrCounter(MyCounter.INPUT_WORDS, 1);
You can programmatically access the counters in the following manner:
RunningJob job = JobClient.runJob(conf); // blocks until job completes Counters c = job.getCounters(); long cnt = c.getCounter(MyCounter.INPUT_WORDS);
Custom counters are very useful in aggregating small bits of information. Pretty simple, huh?