public class BigramChineseTokenizer extends Tokenizer
| Constructor and Description |
|---|
BigramChineseTokenizer() |
| Modifier and Type | Method and Description |
|---|---|
void |
configure(Configuration conf) |
void |
configure(Configuration conf,
FileSystem fs) |
String[] |
processContent(String text) |
String |
removeBorderStopWords(String tokenizedText)
Remove stop words from text that has been tokenized.
|
getNumberTokens, getOOVRate, getStem2NonStemMapping, getUTF8, getVocab, isDiscard, isDiscard, isStemming, isStopWord, isStopWord, isStopwordRemoval, main, normalizeFrench, removeNonUnicodeChars, setVocab, stempublic void configure(Configuration conf)
public void configure(Configuration conf, FileSystem fs)
public String[] processContent(String text)
processContent in class Tokenizerpublic String removeBorderStopWords(String tokenizedText)
TokenizerremoveBorderStopWords in class TokenizertokenizedText - input text, assumed to be tokenized.