Working with counters

Counters are lightweight objects in Hadoop that allow you to keep track of system progress in both the map and reduce stages of processing. By default, Hadoop defines a number of standard counters in "groups"; these show up in the jobtracker webapp, giving you information such as "Map input records", "Map output records", etc. This short guide shows you how to programmatically manipulate counters, and is up to date as of Hadoop 0.20.1.

Built-In Counters

Here's some sample code that iterates through all available counter groups and prints out names of counters in each group:

RunningJob job = JobClient.runJob(conf);
Counters counters = job.getCounters();
for (Counters.Group group : counters) {
    System.out.println("- Counter Group: " + group.getDisplayName() + " (" + group.getName() + ")");
    System.out.println("  number of counters in this group: " + group.size());
    for (Counters.Counter counter : group) {
        System.out.println("  - " + counter.getDisplayName() + ": " + counter.getName());
    }
}

In Hadoop 0.20.1, It is a bit awkward to access counter values directly. The easiest way to look up a counter is by an enum, but unfortunately the enums that correspond to the built-in counters are not publicly accessible. See JIRA HADOOP-4043 and JIRA HADOOP-5717 for more information about this. This issue has been fixed, and is slated for inclusion in Hadoop 0.21.0

In the meantime, the current working solution is to look up counters by their string names, which is more verbose. Here are the group names and names of counters within those individual groups:

Group NameCounter Name
org.apache.hadoop.mapred.Task$CounterMAP_INPUT_RECORDS
org.apache.hadoop.mapred.Task$CounterMAP_OUTPUT_RECORDS
org.apache.hadoop.mapred.Task$CounterMAP_SKIPPED_RECORDS
org.apache.hadoop.mapred.Task$CounterMAP_INPUT_BYTES
org.apache.hadoop.mapred.Task$CounterMAP_OUTPUT_BYTES
org.apache.hadoop.mapred.Task$CounterCOMBINE_INPUT_RECORDS
org.apache.hadoop.mapred.Task$CounterCOMBINE_OUTPUT_RECORDS
org.apache.hadoop.mapred.Task$CounterREDUCE_INPUT_GROUPS
org.apache.hadoop.mapred.Task$CounterREDUCE_SHUFFLE_BYTES
org.apache.hadoop.mapred.Task$CounterREDUCE_INPUT_RECORDS
org.apache.hadoop.mapred.Task$CounterREDUCE_OUTPUT_RECORDS
org.apache.hadoop.mapred.Task$CounterREDUCE_SKIPPED_GROUPS
org.apache.hadoop.mapred.Task$CounterREDUCE_SKIPPED_RECORDS
org.apache.hadoop.mapred.JobInProgress$CounterTOTAL_LAUNCHED_MAPS
org.apache.hadoop.mapred.JobInProgress$CounterRACK_LOCAL_MAPS
org.apache.hadoop.mapred.JobInProgress$CounterDATA_LOCAL_MAPS
org.apache.hadoop.mapred.JobInProgress$CounterTOTAL_LAUNCHED_REDUCES
FileSystemCountersFILE_BYTES_READ
FileSystemCountersHDFS_BYTES_READ
FileSystemCountersFILE_BYTES_WRITTEN
FileSystemCountersHDFS_BYTES_WRITTEN

And here is some boilerplate code to access individual counters:

Counters.Counter counter = null;

String[] taskCounterNames = { "MAP_INPUT_RECORDS", "MAP_OUTPUT_RECORDS",
    "MAP_SKIPPED_RECORDS", "MAP_INPUT_BYTES", "MAP_OUTPUT_BYTES",
    "COMBINE_INPUT_RECORDS", "COMBINE_OUTPUT_RECORDS", "REDUCE_INPUT_GROUPS",
    "REDUCE_SHUFFLE_BYTES", "REDUCE_INPUT_RECORDS", "REDUCE_OUTPUT_RECORDS",
    "REDUCE_SKIPPED_GROUPS", "REDUCE_SKIPPED_RECORDS", "SPILLED_RECORDS" };

for (String c : taskCounterNames) {
    counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", c);
    System.out.println(counter.getDisplayName() + ": " + counter.getCounter());
}

String[] jobCounterNames = { "TOTAL_LAUNCHED_MAPS", "RACK_LOCAL_MAPS", "DATA_LOCAL_MAPS",
    "TOTAL_LAUNCHED_REDUCES" };

for (String c : jobCounterNames) {
    counter = counters.findCounter("org.apache.hadoop.mapred.JobInProgress$Counter", c);
    System.out.println(counter.getDisplayName() + ": " + counter.getCounter());
}

String[] fsCounterNames = { "FILE_BYTES_READ", "HDFS_BYTES_READ", "FILE_BYTES_WRITTEN",
    "HDFS_BYTES_WRITTEN" };

for (String c : fsCounterNames) {
    counter = counters.findCounter("FileSystemCounters", c);
    System.out.println(counter.getDisplayName() + ": " + counter.getCounter());
}

Custom Counters

In addition to the built-in counters associated with every Hadoop job, you can create your own counters. First, define an enum that contains a list of the counters you'd like to have:

protected static enum MyCounter {
    INPUT_WORDS
};

Then, in the mapper or reducer, all you have to do is update the counter via the reporter object:

reporter.incrCounter(MyCounter.INPUT_WORDS, 1);

You can programmatically access the counters in the following manner:

RunningJob job = JobClient.runJob(conf);  // blocks until job completes
Counters c = job.getCounters();
long cnt = c.getCounter(MyCounter.INPUT_WORDS);

Custom counters are very useful in aggregating small bits of information. Pretty simple, huh?