This document outlines the requirements for the Playback API, which provides a unified interface for controlling text-to-speech (TTS) playback.
- Start, pause, resume, and stop
- Handle both individual and batched text/SSML input
- Report current playback state (playing, paused, stopped)
- Accept plain text and SSML input
- Support multiple utterances
- Emit events for state changes
- Provide word/sentence boundary information
- Report errors and warnings
- Select from available voices
- Configure voice parameters (rate, pitch, volume)
[WIP]
A PlaybackEngineProvider allows you to get available voices and create instances of the PlaybackEngine using one specific voice, language, etc.
This PlaybackEngine is using a voice, its parameters can be set, is loaded with utterances, can preload with context, and allows you to speak an utterance index.
A PlaybackNavigator then handles navigation, continuous play, etc.
To clarify, I am not disagreeing with this. Sorry if that came across as a disagreement, it was just an additional detail I forgot when listing requirements. Actually we discussed about that yesterday and should be indeed removed – my bad that was added out of TS habits.
That’s a good point.
That’s another good point, do we want to have something that acts as
cancel?Re Voice configuration, it is not really thought out at the moment, it’s kinda here as a reminder this exists and should be handled. Sorry if that was unclear. I am actually working on this at the moment so any idea and feedback is highly appreciated. Thanks!