public class GalagoTokenizer extends Tokenizer
Constructor and Description |
---|
GalagoTokenizer() |
Modifier and Type | Method and Description |
---|---|
void |
configure(Configuration conf) |
void |
configure(Configuration mJobConf,
FileSystem fs) |
boolean |
isStemming() |
boolean |
isStopWord(String word)
Overrided by applicable implementing classes.
|
boolean |
isStopwordRemoval() |
static void |
main(String[] args) |
String[] |
processContent(String text) |
getNumberTokens, getOOVRate, getStem2NonStemMapping, getUTF8, getVocab, isDiscard, isDiscard, isStopWord, normalizeFrench, removeBorderStopWords, removeNonUnicodeChars, setVocab, stem
public void configure(Configuration conf)
public void configure(Configuration mJobConf, FileSystem fs)
public boolean isStemming()
isStemming
in class Tokenizer
public boolean isStopWord(String word)
Tokenizer
isStopWord
in class Tokenizer
public boolean isStopwordRemoval()
isStopwordRemoval
in class Tokenizer
public static void main(String[] args)
public String[] processContent(String text)
processContent
in class Tokenizer