BigramChineseTokenizer

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- ivory.core.tokenize.Tokenizer
- - ivory.core.tokenize.BigramChineseTokenizer

public class BigramChineseTokenizer
extends Tokenizer

Constructor Summary

Constructors
Constructor and Description

BigramChineseTokenizer()

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`configure(Configuration conf)`
`void`	`configure(Configuration conf, FileSystem fs)`
`String[]`	`processContent(String text)`
`String`	`removeBorderStopWords(String tokenizedText)` Remove stop words from text that has been tokenized.

Methods inherited from class ivory.core.tokenize.Tokenizer
getNumberTokens, getOOVRate, getStem2NonStemMapping, getUTF8, getVocab, isDiscard, isDiscard, isStemming, isStopWord, isStopWord, isStopwordRemoval, main, normalizeFrench, removeNonUnicodeChars, setVocab, stem

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - BigramChineseTokenizer
```
public BigramChineseTokenizer()
```
- Method Detail
  - configure
```
public void configure(Configuration conf)
```
    Specified by:
    
    configure in class Tokenizer
  - configure
```
public void configure(Configuration conf,
             FileSystem fs)
```
    Specified by:
    
    configure in class Tokenizer
  - processContent
```
public String[] processContent(String text)
```
    Specified by:
    
    processContent in class Tokenizer
  - removeBorderStopWords
```
public String removeBorderStopWords(String tokenizedText)
```
    Description copied from class: Tokenizer
    
    Remove stop words from text that has been tokenized. Useful when postprocessing output of MT system, which is tokenized but not stopword'ed.
    
    Overrides:
    
    removeBorderStopWords in class Tokenizer
    
    Parameters:
    tokenizedText - input text, assumed to be tokenized.
    
    Returns:
    same text without the stop words.

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method