Cloud9

A Hadoop toolkit for working with big data

This page presents solutions for the Boolean retrieval exercise. The main class for performing retrieval is edu.umd.cloud9.example.ir.BooleanRetrieval. Assuming the inverted index has already been constructed and resides on HDFS, the command-line invocation is as follows:

$ hadoop jar target/cloud9-X.Y.Z-fatjar.jar edu.umd.cloud9.example.ir.BooleanRetrieval \
   -index index -collection bible+shakes.nopunc

In this case, we're reading indexes directly from HDFS. Alternative, you can copy the indexes local and use Maven to launch the program:

$ mvn exec:java -Dexec.mainClass=edu.umd.cloud9.example.ir.BooleanRetrieval -Dexec.args="-index index -collection data/bible+shakes.nopunc"

Here are the results for the test queries:

Query: means AND deceit
Reverse Polish Notation: means deceit AND

6870153	 who makes the fairest show means most deceit
8135048	 who cannot steal a shape that means deceit

Query: (white OR red ) AND rose AND pluck
Reverse Polish Notation: white red OR rose AND pluck AND

7841087	 from off this brier pluck a white rose with me
7841229	 pluck a red rose from off this thorn with me
7841354	 i pluck this white rose with plantagenet
7841396	 suffolk i pluck this red rose with young somerset
7842315	 in sign whereof i pluck a white rose too

Query: (unhappy OR outrageous OR (good AND your)) AND fortune
Reverse Polish Notation: unhappy outrageous OR good your AND OR fortune AND

4442172	 the slings and arrows of outrageous fortune
5167827	 friar laurence unhappy fortune by my brotherhood
7110114	 tender your own good fortune