Class | Description |
---|---|
BitextClassifierUtils |
Train and test a bitext classifier.
|
BruteForcePwsim |
A class to extract the similarity list of each sample document, either by performing dot product
between the doc vectors or finding hamming distance between signatures.
|
BruteForcePwsim.MyMapperSignature |
For every document (signature) in the sample, find all other docs that are closer than some
given hamming distance.
|
BruteForcePwsim.MyMapperTermDocVectors |
For every document (term doc vector) in the sample, find all other docs that have cosine
similarity higher than some given threshold.
|
BruteForcePwsim.MyMapperWeightedIntDocVectors |
For every document (weighted int doc vector) in the sample, find all other docs that have
cosine similarity higher than some given threshold.
|
BruteForcePwsim.MyReducer |
This reducer reduces the number of pairs per sample document to a given number
(Ivory.NumResults).
|
ConvertMapToPairs |
Convert the format of the PCP algorithm's output.
|
ConvertMapToPairs.MyMapper |
Input is keyed by german docno, and the value is a map from similar english docnos to
similarity weights.
|
Docnos2Titles | |
ExtractWikipedia |
A class to extract interwiki language links from a Wikipedia collection .
|
FilterResults | |
FilterResults.MyMapper | |
FilterResults.MyMapperTopN |
Filter results that are not from sample and/or have distance more than specified in option
Ivory.MaxHammingDistance.
|
FilterResults.MyReducerTopN | |
OutputResultsAsText |
Read in sequence file format and output as text format.
|
SampleIntDocVectors |
A program that samples from a collection of key,value pairs according to a given frequency.
|
SampleIntDocVectors.MyReducer | |
SampleSignatures | |
SampleSignatures.MyMapper |
Filter signatures that are not from sample.
|
SampleSignatures.MyReducer | |
SampleTermDocVectors |
A program that samples from a collection of key,value pairs according to a given frequency.
|
SampleTermDocVectors.MyReducer |