Ivory

A Hadoop toolkit for web-scale information retrieval research

These regression runs represent experiments presented in Wang et al.'s SIGIR 2011 paper on A Cascade Ranking Model for Efficient Ranked Retrieval.

Wt10g results

Main results in Table 2:

# command-line
etc/run.sh ivory.smrf.retrieval.RunQueryLocal data/wt10g/run.wt10g.SIGIR2011.xml data/wt10g/queries.wt10g.501-550.xml

# evaluating effectiveness
etc/trec_eval data/wt10g/qrels.wt10g.all ranking.SIGIR2011-Wt10g-QL.txt
etc/trec_eval data/wt10g/qrels.wt10g.all ranking.SIGIR2011-Wt10g-AdaRank.txt
etc/trec_eval data/wt10g/qrels.wt10g.all ranking.SIGIR2011-Wt10g-FeaturePrune.txt
etc/trec_eval data/wt10g/qrels.wt10g.all ranking.SIGIR2011-Wt10g-Cascade.txt

# junit
etc/junit.sh ivory.regression.sigir2011.Wt10g_Cascade
description tag NDCG P20
Baseline query-likelihood (Dirichlet scoring) Wt10g-QL 0.3407 0.3240
AdaRank Wt10g-AdaRank 0.3549 0.3350
Feature pruning (SIGIR 2010) Wt10g-FeaturePrune 0.3486 0.3310
Cascade Wt10g-Cascade 0.3560 0.3380

Results in Figure 3:

# command-line
etc/run.sh ivory.smrf.retrieval.RunQueryLocal data/wt10g/run.wt10g.SIGIR2011.varying.tradeoff.featureprune.xml data/wt10g/queries.wt10g.501-550.xml
etc/run.sh ivory.smrf.retrieval.RunQueryLocal data/wt10g/run.wt10g.SIGIR2011.varying.tradeoff.cascade.xml data/wt10g/queries.wt10g.501-550.xml

# junit
etc/junit.sh ivory.regression.sigir2011.Wt10g_VaryingTradeoff_FeaturePrune
etc/junit.sh ivory.regression.sigir2011.Wt10g_VaryingTradeoff_Cascade

Gov2 results

Main results in Table 2:

# command-line
etc/run.sh ivory.smrf.retrieval.RunQueryLocal data/gov2/run.gov2.SIGIR2011.xml data/gov2/queries.gov2.title.776-850.xml

# evaluating effectiveness
etc/trec_eval data/gov2/qrels.gov2.all ranking.SIGIR2011-Gov2-QL.txt
etc/trec_eval data/gov2/qrels.gov2.all ranking.SIGIR2011-Gov2-AdaRank.txt
etc/trec_eval data/gov2/qrels.gov2.all ranking.SIGIR2011-Gov2-FeaturePrune.txt
etc/trec_eval data/gov2/qrels.gov2.all ranking.SIGIR2011-Gov2-Cascade.txt

# junit
etc/junit.sh ivory.regression.sigir2011.Gov2_Cascade
description tag NDCG P20
Baseline query-likelihood (Dirichlet scoring) Gov2-QL 0.4457 0.5093
AdaRank Gov2-AdaRank 0.4737 0.5360
Feature pruning (SIGIR 2010) Gov2-FeaturePrune 0.4716 0.5187
Cascade Gov2-Cascade 0.4744 0.5447

Results in Figure 3:

# command-line
etc/run.sh ivory.smrf.retrieval.RunQueryLocal data/gov2/run.gov2.SIGIR2011.varying.tradeoff.featureprune.xml data/gov2/queries.gov2.title.776-850.xml
etc/run.sh ivory.smrf.retrieval.RunQueryLocal data/gov2/run.gov2.SIGIR2011.varying.tradeoff.cascade.xml data/gov2/queries.gov2.title.776-850.xml

# junit
etc/junit.sh ivory.regression.sigir2011.Gov2_VaryingTradeoff_FeaturePrune
etc/junit.sh ivory.regression.sigir2011.Gov2_VaryingTradeoff_Cascade

Clue results

Main results in Table 2:

# command-line
etc/run.sh ivory.smrf.retrieval.RunQueryLocal data/clue/run.clue.SIGIR2011.xml data/clue/queries.web09.26-50.xml

# evaluating effectiveness
etc/trec_eval data/clue/qrels.web09catB.txt ranking.SIGIR2011-Clue-QL.txt
etc/trec_eval data/clue/qrels.web09catB.txt ranking.SIGIR2011-Clue-AdaRank.txt
etc/trec_eval data/clue/qrels.web09catB.txt ranking.SIGIR2011-Clue-FeaturePrune.txt
etc/trec_eval data/clue/qrels.web09catB.txt ranking.SIGIR2011-Clue-Cascade.txt

# junit
etc/junit.sh ivory.regression.sigir2011.Clue_Cascade
description tag NDCG P20
Baseline query-likelihood (Dirichlet scoring) Clue-QL 0.2750 0.3420
AdaRank Clue-AdaRank 0.3094 0.3740
Feature pruning (SIGIR 2010) Clue-FeaturePrune 0.2966 0.3620
Cascade Clue-Cascade 0.3060 0.3740

Results in Figure 3:

# command-line
etc/run.sh ivory.smrf.retrieval.RunQueryLocal data/clue/run.clue.SIGIR2011.varying.tradeoff.featureprune.xml data/clue/queries.web09.26-50.xml
etc/run.sh ivory.smrf.retrieval.RunQueryLocal data/clue/run.clue.SIGIR2011.varying.tradeoff.cascade.xml data/clue/queries.web09.26-50.xml

# junit
etc/junit.sh ivory.regression.sigir2011.Clue_VaryingTradeoff_FeaturePrune
etc/junit.sh ivory.regression.sigir2011.Clue_VaryingTradeoff_Cascade