Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multilingual docs #163

Merged
merged 1 commit into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions docs/en/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
## Configuration

### Initialization Parameters for `TextToAudioStream`

When you initialize the `TextToAudioStream` class, you have various options to customize its behavior. Here are the available parameters:

#### `engine` (BaseEngine)
- **Type**: BaseEngine
- **Required**: Yes
- **Description**: The underlying engine responsible for text-to-audio synthesis. You must provide an instance of `BaseEngine` or its subclass to enable audio synthesis.

#### `on_text_stream_start` (callable)
- **Type**: Callable function
- **Required**: No
- **Description**: This optional callback function is triggered when the text stream begins. Use it for any setup or logging you may need.

#### `on_text_stream_stop` (callable)
- **Type**: Callable function
- **Required**: No
- **Description**: This optional callback function is activated when the text stream ends. You can use this for cleanup tasks or logging.

#### `on_audio_stream_start` (callable)
- **Type**: Callable function
- **Required**: No
- **Description**: This optional callback function is invoked when the audio stream starts. Useful for UI updates or event logging.

#### `on_audio_stream_stop` (callable)
- **Type**: Callable function
- **Required**: No
- **Description**: This optional callback function is called when the audio stream stops. Ideal for resource cleanup or post-processing tasks.

#### `on_character` (callable)
- **Type**: Callable function
- **Required**: No
- **Description**: This optional callback function is called when a single character is processed.

#### `output_device_index` (int)
- **Type**: Integer
- **Required**: No
- **Default**: None
- **Description**: Specifies the output device index to use. None uses the default device.

#### `tokenizer` (string)
- **Type**: String
- **Required**: No
- **Default**: nltk
- **Description**: Tokenizer to use for sentence splitting (currently "nltk" and "stanza" are supported).

#### `language` (string)
- **Type**: String
- **Required**: No
- **Default**: en
- **Description**: Language to use for sentence splitting.

#### `muted` (bool)
- **Type**: Bool
- **Required**: No
- **Default**: False
- **Description**: Global muted parameter. If True, no pyAudio stream will be opened. Disables audio playback via local speakers (in case you want to synthesize to file or process audio chunks) and overrides the play parameters muted setting.

#### `level` (int)
- **Type**: Integer
- **Required**: No
- **Default**: `logging.WARNING`
- **Description**: Sets the logging level for the internal logger. This can be any integer constant from Python's built-in `logging` module.

#### Example Usage:

```python
engine = YourEngine() # Substitute with your engine
stream = TextToAudioStream(
engine=engine,
on_text_stream_start=my_text_start_func,
on_text_stream_stop=my_text_stop_func,
on_audio_stream_start=my_audio_start_func,
on_audio_stream_stop=my_audio_stop_func,
level=logging.INFO
)
```

### Methods

#### `play` and `play_async`

These methods are responsible for executing the text-to-audio synthesis and playing the audio stream. The difference is that `play` is a blocking function, while `play_async` runs in a separate thread, allowing other operations to proceed.

##### Parameters:

###### `fast_sentence_fragment` (bool)
- **Default**: `True`
- **Description**: When set to `True`, the method will prioritize speed, generating and playing sentence fragments faster. This is useful for applications where latency matters.

###### `fast_sentence_fragment_allsentences` (bool)
- **Default**: `False`
- **Description**: When set to `True`, applies the fast sentence fragment processing to all sentences, not just the first one.

###### `fast_sentence_fragment_allsentences_multiple` (bool)
- **Default**: `False`
- **Description**: When set to `True`, allows yielding multiple sentence fragments instead of just a single one.

###### `buffer_threshold_seconds` (float)
- **Default**: `0.0`
- **Description**: Specifies the time in seconds for the buffering threshold, which impacts the smoothness and continuity of audio playback.

- **How it Works**: Before synthesizing a new sentence, the system checks if there is more audio material left in the buffer than the time specified by `buffer_threshold_seconds`. If so, it retrieves another sentence from the text generator, assuming that it can fetch and synthesize this new sentence within the time window provided by the remaining audio in the buffer. This process allows the text-to-speech engine to have more context for better synthesis, enhancing the user experience.

A higher value ensures that there's more pre-buffered audio, reducing the likelihood of silence or gaps during playback. If you experience breaks or pauses, consider increasing this value.

###### `minimum_sentence_length` (int)
- **Default**: `10`
- **Description**: Sets the minimum character length to consider a string as a sentence to be synthesized. This affects how text chunks are processed and played.

###### `minimum_first_fragment_length` (int)
- **Default**: `10`
- **Description**: The minimum number of characters required for the first sentence fragment before yielding.

###### `log_synthesized_text` (bool)
- **Default**: `False`
- **Description**: When enabled, logs the text chunks as they are synthesized into audio. Helpful for auditing and debugging.

###### `reset_generated_text` (bool)
- **Default**: `True`
- **Description**: If True, reset the generated text before processing.

###### `output_wavfile` (str)
- **Default**: `None`
- **Description**: If set, save the audio to the specified WAV file.

###### `on_sentence_synthesized` (callable)
- **Default**: `None`
- **Description**: A callback function that gets called after a single sentence fragment was synthesized.

###### `before_sentence_synthesized` (callable)
- **Default**: `None`
- **Description**: A callback function that gets called before a single sentence fragment gets synthesized.

###### `on_audio_chunk` (callable)
- **Default**: `None`
- **Description**: Callback function that gets called when a single audio chunk is ready.

###### `tokenizer` (str)
- **Default**: `"nltk"`
- **Description**: Tokenizer to use for sentence splitting. Currently supports "nltk" and "stanza".

###### `tokenize_sentences` (callable)
- **Default**: `None`
- **Description**: A custom function that tokenizes sentences from the input text. You can provide your own lightweight tokenizer if you are unhappy with nltk and stanza. It should take text as a string and return split sentences as a list of strings.

###### `language` (str)
- **Default**: `"en"`
- **Description**: Language to use for sentence splitting.

###### `context_size` (int)
- **Default**: `12`
- **Description**: The number of characters used to establish context for sentence boundary detection. A larger context improves the accuracy of detecting sentence boundaries.

###### `context_size_look_overhead` (int)
- **Default**: `12`
- **Description**: Additional context size for looking ahead when detecting sentence boundaries.

###### `muted` (bool)
- **Default**: `False`
- **Description**: If True, disables audio playback via local speakers. Useful when you want to synthesize to a file or process audio chunks without playing them.

###### `sentence_fragment_delimiters` (str)
- **Default**: `".?!;:,\n…)]}。-"`
- **Description**: A string of characters that are considered sentence delimiters.

###### `force_first_fragment_after_words` (int)
- **Default**: `15`
- **Description**: The number of words after which the first sentence fragment is forced to be yielded.

20 changes: 20 additions & 0 deletions docs/en/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Contributing to RealtimeTTS

We welcome contributions to RealtimeTTS! Here are some ways you can contribute:

1. **Reporting Bugs**: If you find a bug, please open an issue on our [GitHub repository](https://github.com/KoljaB/RealtimeTTS/issues).

2. **Suggesting Enhancements**: Have ideas for new features or improvements? We'd love to hear them! Open an issue to suggest enhancements.

3. **Code Contributions**: Want to add a new feature or fix a bug? Great! Please follow these steps:
- Fork the repository
- Create a new branch for your feature
- Make your changes
- Submit a pull request with a clear description of your changes

4. **Documentation**: Help us improve our documentation by fixing typos, adding examples, or clarifying confusing sections.

5. **Adding New Engines**: If you want to add support for a new TTS engine, please open an issue first to discuss the implementation.


Thank you for helping make RealtimeTTS better!
12 changes: 12 additions & 0 deletions docs/en/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Frequently Asked Questions

For answers to frequently asked questions about RealtimeTTS, please refer to our [FAQ page on GitHub](https://github.com/KoljaB/RealtimeTTS/blob/main/FAQ.md).

This page covers various topics including:

- Usage of different TTS engines
- Handling of multilingual text
- Performance optimization
- Troubleshooting common issues

For more detailed information, please visit the link above.
19 changes: 19 additions & 0 deletions docs/en/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# RealtimeTTS

[EN](../en/index.md) | [FR](../fr/index.md)

*Easy to use, low-latency text-to-speech library for realtime applications*

## About the Project

RealtimeTTS is a state-of-the-art text-to-speech (TTS) library designed for real-time applications. It stands out in its ability to convert text streams fast into high-quality auditory output with minimal latency.

## Key Features

- **Low Latency**: almost instantaneous text-to-speech conversion, compatible with LLM outputs
- **High-Quality Audio**: generates clear and natural-sounding speech
- **Multiple TTS Engine Support**: supports OpenAI TTS, Elevenlabs, Azure Speech Services, Coqui TTS, gTTS and System TTS
- **Multilingual**
- **Robust and Reliable**: ensures continuous operation through a fallback mechanism, switches to alternative engines in case of disruptions guaranteeing consistent performance and reliability

For installation instructions, usage examples, and API reference, please navigate through the documentation using the sidebar.
Loading