Cloud9: A MapReduce Library for Hadoop

Working with counters

Counters are lightweight objects in Hadoop that allow you to keep track of system progress in both the map and reduce stages of processing. By default, Hadoop defines a number of standard counters in "groups"; these show up in the jobtracker webapp, giving you information such as "Map input records", "Map output records", etc. This short guide shows you how to programmatically manipulate counters, and is up to date as of Hadoop 0.20.1.

Built-In Counters

Here's some sample code that iterates through all available counter groups and prints out names of counters in each group:

RunningJob job = JobClient.runJob(conf);
Counters counters = job.getCounters();
for (Counters.Group group : counters) {
    System.out.println("- Counter Group: " + group.getDisplayName() + " (" + group.getName() + ")");
    System.out.println("  number of counters in this group: " + group.size());
    for (Counters.Counter counter : group) {
        System.out.println("  - " + counter.getDisplayName() + ": " + counter.getName());
    }
}

In Hadoop 0.20.1, It is a bit awkward to access counter values directly. The easiest way to look up a counter is by an enum, but unfortunately the enums that correspond to the built-in counters are not publicly accessible. See JIRA HADOOP-4043 and JIRA HADOOP-5717 for more information about this. This issue has been fixed, and is slated for inclusion in Hadoop 0.21.0

In the meantime, the current working solution is to look up counters by their string names, which is more verbose. Here are the group names and names of counters within those individual groups:

Group Name	Counter Name
org.apache.hadoop.mapred.Task$Counter	MAP_INPUT_RECORDS
org.apache.hadoop.mapred.Task$Counter	MAP_OUTPUT_RECORDS
org.apache.hadoop.mapred.Task$Counter	MAP_SKIPPED_RECORDS
org.apache.hadoop.mapred.Task$Counter	MAP_INPUT_BYTES
org.apache.hadoop.mapred.Task$Counter	MAP_OUTPUT_BYTES
org.apache.hadoop.mapred.Task$Counter	COMBINE_INPUT_RECORDS
org.apache.hadoop.mapred.Task$Counter	COMBINE_OUTPUT_RECORDS
org.apache.hadoop.mapred.Task$Counter	REDUCE_INPUT_GROUPS
org.apache.hadoop.mapred.Task$Counter	REDUCE_SHUFFLE_BYTES
org.apache.hadoop.mapred.Task$Counter	REDUCE_INPUT_RECORDS
org.apache.hadoop.mapred.Task$Counter	REDUCE_OUTPUT_RECORDS
org.apache.hadoop.mapred.Task$Counter	REDUCE_SKIPPED_GROUPS
org.apache.hadoop.mapred.Task$Counter	REDUCE_SKIPPED_RECORDS
org.apache.hadoop.mapred.JobInProgress$Counter	TOTAL_LAUNCHED_MAPS
org.apache.hadoop.mapred.JobInProgress$Counter	RACK_LOCAL_MAPS
org.apache.hadoop.mapred.JobInProgress$Counter	DATA_LOCAL_MAPS
org.apache.hadoop.mapred.JobInProgress$Counter	TOTAL_LAUNCHED_REDUCES
FileSystemCounters	FILE_BYTES_READ
FileSystemCounters	HDFS_BYTES_READ
FileSystemCounters	FILE_BYTES_WRITTEN
FileSystemCounters	HDFS_BYTES_WRITTEN

And here is some boilerplate code to access individual counters:

Counters.Counter counter = null;

String[] taskCounterNames = { "MAP_INPUT_RECORDS", "MAP_OUTPUT_RECORDS",
    "MAP_SKIPPED_RECORDS", "MAP_INPUT_BYTES", "MAP_OUTPUT_BYTES",
    "COMBINE_INPUT_RECORDS", "COMBINE_OUTPUT_RECORDS", "REDUCE_INPUT_GROUPS",
    "REDUCE_SHUFFLE_BYTES", "REDUCE_INPUT_RECORDS", "REDUCE_OUTPUT_RECORDS",
    "REDUCE_SKIPPED_GROUPS", "REDUCE_SKIPPED_RECORDS", "SPILLED_RECORDS" };

for (String c : taskCounterNames) {
    counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", c);
    System.out.println(counter.getDisplayName() + ": " + counter.getCounter());
}

String[] jobCounterNames = { "TOTAL_LAUNCHED_MAPS", "RACK_LOCAL_MAPS", "DATA_LOCAL_MAPS",
    "TOTAL_LAUNCHED_REDUCES" };

for (String c : jobCounterNames) {
    counter = counters.findCounter("org.apache.hadoop.mapred.JobInProgress$Counter", c);
    System.out.println(counter.getDisplayName() + ": " + counter.getCounter());
}

String[] fsCounterNames = { "FILE_BYTES_READ", "HDFS_BYTES_READ", "FILE_BYTES_WRITTEN",
    "HDFS_BYTES_WRITTEN" };

for (String c : fsCounterNames) {
    counter = counters.findCounter("FileSystemCounters", c);
    System.out.println(counter.getDisplayName() + ": " + counter.getCounter());
}

Custom Counters

In addition to the built-in counters associated with every Hadoop job, you can create your own counters. First, define an enum that contains a list of the counters you'd like to have:

protected static enum MyCounter {
    INPUT_WORDS
};

Then, in the mapper or reducer, all you have to do is update the counter via the reporter object:

reporter.incrCounter(MyCounter.INPUT_WORDS, 1);

You can programmatically access the counters in the following manner:

RunningJob job = JobClient.runJob(conf);  // blocks until job completes
Counters c = job.getCounters();
long cnt = c.getCounter(MyCounter.INPUT_WORDS);

Custom counters are very useful in aggregating small bits of information. Pretty simple, huh?