public class WordTokenizer extends Object implements Tokenizer
http://foobar.org).| Modifier and Type | Field and Description |
|---|---|
protected String |
REMOVED_EMOJI |
| Constructor and Description |
|---|
WordTokenizer() |
| Modifier and Type | Method and Description |
|---|---|
static List<String> |
getProtocols()
Get the protocols that the tokenizer knows about.
|
String |
getTokenizingCharacters() |
boolean |
isCurrencyExpression(String token) |
static boolean |
isEMail(String token) |
static boolean |
isUrl(String token) |
protected List<String> |
joinEMails(List<String> list) |
protected List<String> |
joinEMailsAndUrls(List<String> list) |
protected List<String> |
joinUrls(List<String> l) |
List<String> |
replaceEmojis(String s) |
List<String> |
restoreEmojis(List<String> tokens,
List<String> removedEmojis) |
List<String> |
splitCurrencyExpression(String token) |
List<String> |
tokenize(String text) |
protected final String REMOVED_EMOJI
public static List<String> getProtocols()
http, https, and ftppublic static boolean isUrl(String token)
public static boolean isEMail(String token)
public String getTokenizingCharacters()
public boolean isCurrencyExpression(String token)