public class MultiWordChunker2 extends AbstractDisambiguator
| Constructor and Description |
|---|
MultiWordChunker2(String filename) |
MultiWordChunker2(String filename,
boolean allowFirstCapitalized) |
| Modifier and Type | Method and Description |
|---|---|
AnalyzedSentence |
disambiguate(AnalyzedSentence input)
Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...)
|
protected String |
formatPosTag(String posTag,
int position,
int multiwordLength)
Override this method if you want format POS tag differently
|
protected boolean |
matches(String matchText,
AnalyzedTokenReadings inputTokens) |
protected AnalyzedTokenReadings |
prepareNewReading(String tokens,
String tok,
AnalyzedTokenReadings token,
String tag) |
void |
setRemoveOtherReadings(boolean removeOtherReadings) |
void |
setWrapTag(boolean wrapTag) |
preDisambiguateclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitdisambiguatepublic MultiWordChunker2(String filename)
filename - file text with multiwords and tagspublic MultiWordChunker2(String filename, boolean allowFirstCapitalized)
filename - file text with multiwords and tagsallowFirstCapitalized - if set to true, first word of the multiword can be capitalizedpublic void setRemoveOtherReadings(boolean removeOtherReadings)
removeOtherReadings - If true and multiword matches other readings will be removedpublic void setWrapTag(boolean wrapTag)
wrapTag - If true the tag will be wrapped with < and >protected String formatPosTag(String posTag, int position, int multiwordLength)
posTag - POS tag for the multiwordposition - Position of the token in the multiwordpublic AnalyzedSentence disambiguate(AnalyzedSentence input)
input - The tokens to be chunked.protected boolean matches(String matchText, AnalyzedTokenReadings inputTokens)
protected AnalyzedTokenReadings prepareNewReading(String tokens, String tok, AnalyzedTokenReadings token, String tag)