Class HuggingFaceTokenizer.Builder

java.lang.Object
ai.djl.huggingface.tokenizers.HuggingFaceTokenizer.Builder
Enclosing class:
HuggingFaceTokenizer

public static final class HuggingFaceTokenizer.Builder extends Object
The builder for creating huggingface tokenizer.
  • Method Details

    • optManager

      public HuggingFaceTokenizer.Builder optManager(ai.djl.ndarray.NDManager manager)
      Sets the optional manager used to manage the lifecycle of the tokenizer.
      Parameters:
      manager - the NDManager
      Returns:
      this builder
    • optTokenizerName

      public HuggingFaceTokenizer.Builder optTokenizerName(String tokenizerName)
      Sets the name of the tokenizer.
      Parameters:
      tokenizerName - the name of the tokenizer
      Returns:
      this builder
    • optTokenizerPath

      public HuggingFaceTokenizer.Builder optTokenizerPath(Path tokenizerPath)
      Sets the file path of the tokenizer.
      Parameters:
      tokenizerPath - the path of the tokenizer
      Returns:
      this builder
    • optAddSpecialTokens

      public HuggingFaceTokenizer.Builder optAddSpecialTokens(boolean addSpecialTokens)
      Sets if add special tokens.
      Parameters:
      addSpecialTokens - true to add special tokens
      Returns:
      this builder
    • optWithOverflowingTokens

      public HuggingFaceTokenizer.Builder optWithOverflowingTokens(boolean withOverflowingTokens)
      Sets if add special tokens.
      Parameters:
      withOverflowingTokens - true to return overflowing tokens
      Returns:
      this builder
    • optTruncation

      public HuggingFaceTokenizer.Builder optTruncation(boolean enabled)
      Enables or Disables default truncation behavior for the tokenizer.
      Parameters:
      enabled - whether to enable default truncation behavior
      Returns:
      this builder
    • optTruncateFirstOnly

      public HuggingFaceTokenizer.Builder optTruncateFirstOnly()
      Enables truncation to only truncate the first item.
      Returns:
      this builder
    • optTruncateSecondOnly

      public HuggingFaceTokenizer.Builder optTruncateSecondOnly()
      Enables truncation to only truncate the second item.
      Returns:
      this builder
    • optPadding

      public HuggingFaceTokenizer.Builder optPadding(boolean enabled)
      Enables or Disables default padding behavior for the tokenizer.
      Parameters:
      enabled - whether to enable default padding behavior
      Returns:
      this builder
    • optPadToMaxLength

      public HuggingFaceTokenizer.Builder optPadToMaxLength()
      Enables padding to pad sequences to previously specified maxLength, or modelMaxLength if not specified.
      Returns:
      this builder
    • optMaxLength

      public HuggingFaceTokenizer.Builder optMaxLength(int maxLength)
      Sets maxLength for padding and truncation.
      Parameters:
      maxLength - the length to truncate and/or pad sequences to
      Returns:
      this builder
    • optPadToMultipleOf

      public HuggingFaceTokenizer.Builder optPadToMultipleOf(int padToMultipleOf)
      Sets padToMultipleOf for padding.
      Parameters:
      padToMultipleOf - the multiple of sequences should be padded to
      Returns:
      this builder
    • optStride

      public HuggingFaceTokenizer.Builder optStride(int stride)
      Sets the stride to use in overflow overlap when truncating sequences longer than the model supports.
      Parameters:
      stride - the number of tokens to overlap when truncating long sequences
      Returns:
      this builder
    • optDoLowerCase

      public HuggingFaceTokenizer.Builder optDoLowerCase(boolean doLowerCase)
      Sets the doLowerCase for the tokenizer.
      Parameters:
      doLowerCase - true to enable convert to lowercase
      Returns:
      this builder
    • optDoLowerCase

      public HuggingFaceTokenizer.Builder optDoLowerCase(String locale)
      Sets the doLowerCase for the tokenizer with specific locale.
      Parameters:
      locale - the locale to use when converting to lowercase
      Returns:
      this builder
    • configure

      public void configure(Map<String,?> arguments)
      Configures the builder with the arguments.
      Parameters:
      arguments - the arguments
    • build

      public HuggingFaceTokenizer build() throws IOException
      Builds the translator.
      Returns:
      the new translator
      Throws:
      IOException - when IO operation fails in loading a resource