Ivory: A Hadoop toolkit for web-scale information retrieval research

Ivory is a research toolkit for exploring large-scale indexing and retrieval algorithms.

If you're looking for something that "just works", then Ivory probably isn't for you. Try Lucene instead. On the other hand, if you're interested in playing with (half-baked) implementations of state-of-the-art retrieval algorithms, this may be the system for you! Ivory includes features presented in many academic research papers:

A scalable inverted indexing algorithm implemented in Hadoop (TREC 2009), designed to work "out of the box" with the ClueWeb09 collection web crawl.
A retrieval engine that implements the Markov Random Field retrieval model (SIGIR 2005), with additional extensions such as the Weighted Sequential Dependence model (WSDM 2010) and Latent Concept Expansion (SIGIR 2007).
Implementations of "learning to efficient rank" models: the efficient sequential dependence model (SIGIR 2010), temporally-constrained ranking functions (CIKM 2010), and ranking cascades (SIGIR 2011).
The Postings Cartesian Product algorithm for pairwise similarity computations (SIGIR 2009).
A cross-language pairwise similarity computation engine based on locality-sensitive hashing (SIGIR 2011).
Cross-language retrieval models based on Synchronous Context Free Grammars and other statistical machine translation techniques (COLING 2012).
Various feature extraction techniques using document vectors (IRJ 2012).

See our publications page for a full list of publications.