Class RealtimeAudioInputTurnDetection.ServerVad

  • All Implemented Interfaces:

    
    public final class RealtimeAudioInputTurnDetection.ServerVad
    
                        

    Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.

    • Constructor Detail

    • Method Detail

      • _type

         final JsonValue _type()

        Type of turn detection, server_vad to turn on simple Server VAD.

        Expected to always return the following:

        JsonValue.from("server_vad")

        However, this method can be useful for debugging and logging (e.g. if the server responded with an unexpected value).

      • createResponse

         final Optional<Boolean> createResponse()

        Whether or not to automatically generate a response when a VAD stop event occurs. If interrupt_response is set to false this may fail to create a response if the model is already responding.

        If both create_response and interrupt_response are set to false, the model will never respond automatically but VAD events will still be emitted.

      • idleTimeoutMs

         final Optional<Long> idleTimeoutMs()

        Optional timeout after which a model response will be triggered automatically. This is useful for situations in which a long pause from the user is unexpected, such as a phone call. The model will effectively prompt the user to continue the conversation based on the current context.

        The timeout value will be applied after the last model response's audio has finished playing, i.e. it's set to the response.done time plus audio playback duration.

        An input_audio_buffer.timeout_triggered event (plus events associated with the Response) will be emitted when the timeout is reached. Idle timeout is currently only supported for server_vad mode.

      • interruptResponse

         final Optional<Boolean> interruptResponse()

        Whether or not to automatically interrupt (cancel) any ongoing response with output to the default conversation (i.e. conversation of auto) when a VAD start event occurs. If true then the response will be cancelled, otherwise it will continue until complete.

        If both create_response and interrupt_response are set to false, the model will never respond automatically but VAD events will still be emitted.

      • prefixPaddingMs

         final Optional<Long> prefixPaddingMs()

        Used only for server_vad mode. Amount of audio to include before the VAD detected speech (in milliseconds). Defaults to 300ms.

      • silenceDurationMs

         final Optional<Long> silenceDurationMs()

        Used only for server_vad mode. Duration of silence to detect speech stop (in milliseconds). Defaults to 500ms. With shorter values the model will respond more quickly, but may jump in on short pauses from the user.

      • threshold

         final Optional<Double> threshold()

        Used only for server_vad mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A higher threshold will require louder audio to activate the model, and thus might perform better in noisy environments.