Debugging MapReduce Programs with log4j

Hadoop uses log4j for it's logging infrastructure, which you can leverage for debugging MapReduce programs also. It comes bundled with the Hadoop distribution so you shouldn't need to download anything.

You can easily get a handle on a Logger by putting something like this in your class:

import org.apache.log4j.Logger;
...

public class Foo {
    private static final Logger sLogger = Logger.getLogger(Foo.class);
    ...
}

Logger.getLogger will also accept any String, though it's common practice to name the Logger after the class it's logging in.

Note: In the most common case, you'll have mapper and reducer classes as nested static classes within some enclosing class—you'll want to grab a handle to the Logger from the enclosing class (not inside each mapper or reducer class itself).

After you get a handle on the logger, within your class you can make logging calls at different levels.

sLogger.debug("Debug message");
sLogger.info("Info message");
sLogger.warn("Warn message");
sLogger.error("Error message");
sLogger.fatal("Fatal message");

You can also log Exceptions:

try {
  ...
} catch (Exception x) {
    sLogger.error("Caught some exception", x);
}

The above will log the Exception info, including a stack trace, of x along with the message.

Without messing with the Hadoop log4j configurations, you can programmatically set the log level:

sLogger.setLevel(Level.INFO); // or DEBUG, WARN, whatever

You might want to set the logging level inside the configure method of the mapper or reducer. Each Level includes all higher priority levels. So, e.g., at DEBUG, all DEBUG, INFO, WARN, etc. messages will be logged. At WARN, calls to debug() and info() won't log anything.

If you want to log a DEBUG or INFO message that requires any non-trivial work (generally anything more than String appending) which wouldn't otherwise need to be executed, it's best to test the level first so you don't waste a lot of cycles when running at a higher logging level:

if (sLogger.isDebugEnabled()) {
    String msg = doALotOfWork();
    sLogger.debug(msg);
}

You can get access to the logging info via the Hadoop Job Tracker Webapp (i.e., on port 50030). If you drill down into individual map or reduce tasks, you'll see a column in the table with the heading "Task Logs". The links in this column provide you access to the logs.