Ivory is a research toolkit
for exploring large-scale indexing and retrieval algorithms.
If you're looking for something that "just works", then Ivory
probably isn't for
you. Try Lucene
instead. On the other hand, if you're interested in playing with
(half-baked) implementations of state-of-the-art retrieval algorithms,
this may be the system for you! Ivory includes features presented in
many academic research papers:
- A scalable inverted indexing algorithm implemented in Hadoop (TREC
2009), designed to work "out of the box" with
the ClueWeb09
collection web crawl.
- A retrieval engine that implements the Markov Random Field
retrieval model (SIGIR 2005), with additional extensions such as the
Weighted Sequential Dependence model (WSDM 2010) and Latent Concept
Expansion (SIGIR 2007).
- Implementations of "learning to efficient rank" models: the
efficient sequential dependence model (SIGIR 2010),
temporally-constrained ranking functions (CIKM 2010), and ranking
cascades (SIGIR 2011).
- The Postings Cartesian Product algorithm for pairwise similarity
computations (SIGIR 2009).
- A cross-language pairwise similarity computation engine based on
locality-sensitive hashing (SIGIR 2011).
- Cross-language retrieval models based on
Synchronous Context Free Grammars and other statistical machine
translation techniques (COLING 2012).
- Various feature extraction techniques using document vectors
(IRJ 2012).
See our publications page for a
full list of publications.