Package ai.djl.huggingface.tokenizers
Class HuggingFaceTokenizer.Builder
java.lang.Object
ai.djl.huggingface.tokenizers.HuggingFaceTokenizer.Builder
- Enclosing class:
- HuggingFaceTokenizer
The builder for creating huggingface tokenizer.
-
Method Summary
Modifier and TypeMethodDescriptionbuild()Builds the translator.voidConfigures the builder with the arguments.optAddSpecialTokens(boolean addSpecialTokens) Sets if add special tokens.optDoLowerCase(boolean doLowerCase) Sets the doLowerCase for the tokenizer.optDoLowerCase(String locale) Sets the doLowerCase for the tokenizer with specific locale.optManager(ai.djl.ndarray.NDManager manager) Sets the optional manager used to manage the lifecycle of the tokenizer.optMaxLength(int maxLength) Sets maxLength for padding and truncation.optPadding(boolean enabled) Enables or Disables default padding behavior for the tokenizer.Enables padding to pad sequences to previously specified maxLength, or modelMaxLength if not specified.optPadToMultipleOf(int padToMultipleOf) Sets padToMultipleOf for padding.optStride(int stride) Sets the stride to use in overflow overlap when truncating sequences longer than the model supports.optTokenizerName(String tokenizerName) Sets the name of the tokenizer.optTokenizerPath(Path tokenizerPath) Sets the file path of the tokenizer.Enables truncation to only truncate the first item.Enables truncation to only truncate the second item.optTruncation(boolean enabled) Enables or Disables default truncation behavior for the tokenizer.optWithOverflowingTokens(boolean withOverflowingTokens) Sets if add special tokens.
-
Method Details
-
optManager
Sets the optional manager used to manage the lifecycle of the tokenizer.- Parameters:
manager- theNDManager- Returns:
- this builder
-
optTokenizerName
Sets the name of the tokenizer.- Parameters:
tokenizerName- the name of the tokenizer- Returns:
- this builder
-
optTokenizerPath
Sets the file path of the tokenizer.- Parameters:
tokenizerPath- the path of the tokenizer- Returns:
- this builder
-
optAddSpecialTokens
Sets if add special tokens.- Parameters:
addSpecialTokens- true to add special tokens- Returns:
- this builder
-
optWithOverflowingTokens
Sets if add special tokens.- Parameters:
withOverflowingTokens- true to return overflowing tokens- Returns:
- this builder
-
optTruncation
Enables or Disables default truncation behavior for the tokenizer.- Parameters:
enabled- whether to enable default truncation behavior- Returns:
- this builder
-
optTruncateFirstOnly
Enables truncation to only truncate the first item.- Returns:
- this builder
-
optTruncateSecondOnly
Enables truncation to only truncate the second item.- Returns:
- this builder
-
optPadding
Enables or Disables default padding behavior for the tokenizer.- Parameters:
enabled- whether to enable default padding behavior- Returns:
- this builder
-
optPadToMaxLength
Enables padding to pad sequences to previously specified maxLength, or modelMaxLength if not specified.- Returns:
- this builder
-
optMaxLength
Sets maxLength for padding and truncation.- Parameters:
maxLength- the length to truncate and/or pad sequences to- Returns:
- this builder
-
optPadToMultipleOf
Sets padToMultipleOf for padding.- Parameters:
padToMultipleOf- the multiple of sequences should be padded to- Returns:
- this builder
-
optStride
Sets the stride to use in overflow overlap when truncating sequences longer than the model supports.- Parameters:
stride- the number of tokens to overlap when truncating long sequences- Returns:
- this builder
-
optDoLowerCase
Sets the doLowerCase for the tokenizer.- Parameters:
doLowerCase-trueto enable convert to lowercase- Returns:
- this builder
-
optDoLowerCase
Sets the doLowerCase for the tokenizer with specific locale.- Parameters:
locale- the locale to use when converting to lowercase- Returns:
- this builder
-
configure
Configures the builder with the arguments.- Parameters:
arguments- the arguments
-
build
Builds the translator.- Returns:
- the new translator
- Throws:
IOException- when IO operation fails in loading a resource
-