public class GalagoTokenizer extends Tokenizer
| Constructor and Description |
|---|
GalagoTokenizer() |
| Modifier and Type | Method and Description |
|---|---|
void |
configure(Configuration conf) |
void |
configure(Configuration mJobConf,
FileSystem fs) |
boolean |
isStemming() |
boolean |
isStopWord(String word)
Overrided by applicable implementing classes.
|
boolean |
isStopwordRemoval() |
static void |
main(String[] args) |
String[] |
processContent(String text) |
getNumberTokens, getOOVRate, getStem2NonStemMapping, getUTF8, getVocab, isDiscard, isDiscard, isStopWord, normalizeFrench, removeBorderStopWords, removeNonUnicodeChars, setVocab, stempublic void configure(Configuration conf)
public void configure(Configuration mJobConf, FileSystem fs)
public boolean isStemming()
isStemming in class Tokenizerpublic boolean isStopWord(String word)
TokenizerisStopWord in class Tokenizerpublic boolean isStopwordRemoval()
isStopwordRemoval in class Tokenizerpublic static void main(String[] args)
public String[] processContent(String text)
processContent in class Tokenizer