Computing pairwise document similarity given a document-sorted inverted
index. This implementation is based on the algorithms described in the
following papers:
- Tamer Elsayed, Jimmy Lin, and Douglas Oard. Pairwise Document
Similarity in Large Collections with MapReduce. Proceedings of the
46th Annual Meeting of the Association for Computational Linguistics (ACL
2008), Companion Volume, pages 265-268, June 2008, Columbus, Ohio.
- Jimmy Lin. Brute Force and
Indexed Approaches to Pairwise Document Similarity Comparisons with
MapReduce. Proceedings of the 32nd Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval (SIGIR 2009),
pages 155-162, July 2009, Boston, Massachusetts.