A Hadoop toolkit for working with big data
This page presents solutions for
the Boolean retrieval exercise. The main
class for performing retrieval is
edu.umd.cloud9.example.ir.BooleanRetrieval
. Assuming the
inverted index has already been constructed and resides on HDFS, the
command-line invocation is as follows:
$ hadoop jar target/cloud9-X.Y.Z-fatjar.jar edu.umd.cloud9.example.ir.BooleanRetrieval \ -index index -collection bible+shakes.nopunc
In this case, we're reading indexes directly from HDFS. Alternative, you can copy the indexes local and use Maven to launch the program:
$ mvn exec:java -Dexec.mainClass=edu.umd.cloud9.example.ir.BooleanRetrieval -Dexec.args="-index index -collection data/bible+shakes.nopunc"
Here are the results for the test queries:
Query: means AND deceit
Reverse Polish Notation: means deceit AND
6870153 who makes the fairest show means most deceit 8135048 who cannot steal a shape that means deceit
Query: (white OR red ) AND rose AND pluck
Reverse Polish Notation: white red OR rose AND pluck AND
7841087 from off this brier pluck a white rose with me 7841229 pluck a red rose from off this thorn with me 7841354 i pluck this white rose with plantagenet 7841396 suffolk i pluck this red rose with young somerset 7842315 in sign whereof i pluck a white rose too
Query: (unhappy OR outrageous OR (good AND your)) AND fortune
Reverse Polish Notation: unhappy outrageous OR good your AND OR fortune AND
4442172 the slings and arrows of outrageous fortune 5167827 friar laurence unhappy fortune by my brotherhood 7110114 tender your own good fortune