Skip to content

Commit

Permalink
Merge pull request #31 from erew123/dev
Browse files Browse the repository at this point in the history
3x new API endpoints & Play audio at command prompt/terminal
  • Loading branch information
erew123 authored Dec 29, 2023
2 parents 3cd2208 + a094d4f commit fa82870
Show file tree
Hide file tree
Showing 6 changed files with 195 additions and 47 deletions.
123 changes: 79 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -499,9 +499,67 @@ Deepspeed and other such things can be installed. Please read the relevant instr
### 🟠Overview
The Text-to-Speech (TTS) Generation API allows you to generate speech from text input using various configuration options. This API supports both character and narrator voices, providing flexibility for creating dynamic and engaging audio content.

- URL: `http://127.0.0.1:7851/api/tts-generate`<br>
- Method: `POST`<br>
- Content-Type: `application/x-www-form-urlencoded`<br>
#### 🟠 Ready Endpoint<br>
Check if the Text-to-Speech (TTS) service is ready to accept requests.

- URL: `http://127.0.0.1:7851/api/ready`<br> - Method: `GET`<br>

`curl -X GET "http://127.0.0.1:7851/api/ready"`

Response: `Ready`

#### 🟠 Voices List Endpoint<br>
Retrieve a list of available voices for generating speech.

- URL: `http://127.0.0.1:7851/api/voices`<br> - Method: `GET`<br>

`curl -X GET "http://127.0.0.1:7851/api/voices"`

JSON return: `{"voices": ["voice1.wav", "voice2.wav", "voice3.wav"]}`

#### 🟠 Preview Voice Endpoint
Generate a preview of a specified voice with hardcoded settings.

- URL: `http://127.0.0.1:7851/api/previewvoice/`<br> - Method: `POST`<br> - Content-Type: `application/x-www-form-urlencoded`<br>

`curl -X POST "http://127.0.0.1:7851/api/previewvoice/" -F "voice=female_01.wav"`

Replace `female_01.wav` with the name of the voice sample you want to hear.

JSON return: `{"status": "generate-success", "output_file_path": "/path/to/outputs/api_preview_voice.wav", "output_file_url": "http://127.0.0.1:7851/audio/api_preview_voice.wav"}`

#### 🟠 Switching Model Endpoint<br>

- URL: `http://127.0.0.1:7851/api/reload`<br> - Method: `POST`<br><br>
`curl -X POST "http://127.0.0.1:7851/api/reload?tts_method=API%20Local"`<br>
`curl -X POST "http://127.0.0.1:7851/api/reload?tts_method=API%20TTS"`<br>
`curl -X POST "http://127.0.0.1:7851/api/reload?tts_method=XTTSv2%20Local"`<br>

Switch between the 3 models respectively.

JSON return `{"status": "model-success"}`

#### 🟠 Switch DeepSpeed Endpoint<br>

- URL: `http://127.0.0.1:7851/api/deepspeed`<br> - Method: `POST`<br><br>
`curl -X POST "http://127.0.0.1:7851/api/deepspeed?new_deepspeed_value=True"`

Replace True with False to disable DeepSpeed mode.

JSON return `{"status": "deepspeed-success"}`

#### 🟠 Switching Low VRAM Endpoint<br>

- URL: `http://127.0.0.1:7851/api/lowvramsetting`<br> - Method: `POST`<br><br>
`curl -X POST "http://127.0.0.1:7851/api/lowvramsetting?new_low_vram_value=True"`

Replace True with False to disable Low VRAM mode.

JSON return `{"status": "lowvram-success"}`

### 🟠 TTS Generation Endpoint

- URL: `http://127.0.0.1:7851/api/tts-generate`<br> - Method: `POST`<br> - Content-Type: `application/x-www-form-urlencoded`<br>

### 🟠 Example command lines
Standard TTS speech Example (standard text) generating a time-stamped file<br>
Expand Down Expand Up @@ -558,22 +616,22 @@ Example:

🟠 **language**: Choose the language for TTS. Options:

`ar Arabic`<br>
`zh-cn Chinese (Simplified)`<br>
`cs Czech`<br>
`nl Dutch`<br>
`en English`<br>
`fr French`<br>
`de German`<br>
`hu Hungarian`<br>
`it Italian`<br>
`ja Japanese`<br>
`ko Korean`<br>
`pl Polish`<br>
`pt Portuguese`<br>
`ru Russian`<br>
`es Spanish`<br>
`tr Turkish`<br>
`ar` Arabic<br>
`zh-cn` Chinese (Simplified)<br>
`cs` Czech<br>
`nl` Dutch<br>
`en` English<br>
`fr` French<br>
`de` German<br>
`hu` Hungarian<br>
`it` Italian<br>
`ja` Japanese<br>
`ko` Korean<br>
`pl` Polish<br>
`pt` Portuguese<br>
`ru` Russian<br>
`es` Spanish<br>
`tr` Turkish<br>

`-d "language=en"`<br>

Expand All @@ -586,12 +644,12 @@ Example:
`-d "output_file_timestamp=true"`<br>
`-d "output_file_timestamp=false"`

🟠 **autoplay**: Feature not yet available. Enable or disable autoplay. Still needs to be specified in the JSON request.
🟠 **autoplay**: Enable or disable playing the generated TTS to your standard sound output device at time of TTS generation.

`-d "autoplay=true"`<br>
`-d "autoplay=false"`

🟠 **autoplay_volume**: Feature not yet available. Set the autoplay volume. Should be between 0.1 and 1.0. Still needs to be specified in the JSON request.
🟠 **autoplay_volume**: Set the autoplay volume. Should be between 0.1 and 1.0. Needs to be specified in the JSON request even if autoplay is false.

`-d "autoplay_volume=0.8"`

Expand All @@ -606,29 +664,6 @@ Example JSON TTS Generation Response:

`{"status": "generate-success", "output_file_path": "C:\text-generation-webui\extensions\alltalk_tts\outputs\myoutputfile_1703149973.wav", "output_file_url": "http://127.0.0.1:7851/audio/myoutputfile_1703149973.wav"}`

🟠 **Switching Model**<br><br>
`curl -X POST "http://127.0.0.1:7851/api/reload?tts_method=API%20Local"`<br>
`curl -X POST "http://127.0.0.1:7851/api/reload?tts_method=API%20TTS"`<br>
`curl -X POST "http://127.0.0.1:7851/api/reload?tts_method=XTTSv2%20Local"`<br>

Switch between the 3 models respectively.

JSON return `{"status": "model-success"}`

🟠 **Switch DeepSpeed**<br><br>
`curl -X POST "http://127.0.0.1:7851/api/deepspeed?new_deepspeed_value=True"`

Replace True with False to disable DeepSpeed mode.

JSON return `{"status": "deepspeed-success"}`

🟠 **Switching Low VRAM**<br><br>
`curl -X POST "http://127.0.0.1:7851/api/lowvramsetting?new_low_vram_value=True"`

Replace True with False to disable Low VRAM mode.

JSON return `{"status": "lowvram-success"}`

### 🔴 Future to-do list
- Voice output within the command prompt/terminal (TBD).
- Correct a few spelling mistakes in the documentation.
Expand Down
3 changes: 2 additions & 1 deletion modeldownload.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
"model.pth": "https://huggingface.co/coqui/XTTS-v2/resolve/v2.0.2/model.pth?download=true",
"dvae.pth": "https://huggingface.co/coqui/XTTS-v2/resolve/v2.0.2/dvae.pth?download=true",
"mel_stats.pth": "https://huggingface.co/coqui/XTTS-v2/resolve/v2.0.2/mel_stats.pth?download=true",
"speakers_xtts.pth": "https://huggingface.co/coqui/XTTS-v2/resolve/v2.0.2/speakers_xtts.pth?download=true",
"vocab.json": "https://huggingface.co/coqui/XTTS-v2/resolve/v2.0.2/vocab.json?download=true"
}
}
}
2 changes: 2 additions & 0 deletions requirements_nvidia.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@ tqdm>=4.66.1
importlib-metadata>=4.8.1
packaging>=23.2
pydantic>=1.10.13
sounddevice>=0.4.6
python-multipart>=0.0.6
2 changes: 2 additions & 0 deletions requirements_other.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@ tqdm>=4.66.1
importlib-metadata>=4.8.1
packaging>=23.2
pydantic>=1.10.13
sounddevice>=0.4.6
python-multipart>=0.0.6
4 changes: 2 additions & 2 deletions script.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ def signal_handler(sig, frame):
timeout = 120 # Adjust the timeout as needed

# Introduce a delay before starting the check loop
time.sleep(25) # Wait 25 secs before checking if the tts_server.py has started up.
time.sleep(26) # Wait 26 secs before checking if the tts_server.py has started up.
start_time = time.time()
while time.time() - start_time < timeout:
try:
Expand All @@ -264,7 +264,7 @@ def signal_handler(sig, frame):
print(
f"[{params['branding']}Startup] \033[91mWarning\033[0m TTS Subprocess has NOT started up yet, Will keep trying for 120 seconds maximum. Please wait."
)
time.sleep(1)
time.sleep(4)
else:
print(
f"[{params['branding']}Startup] Startup timed out. Check the server logs for more information."
Expand Down
108 changes: 108 additions & 0 deletions tts_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -619,6 +619,53 @@ async def get_audio(filename: str):
audio_path = this_dir / "outputs" / filename
return FileResponse(audio_path)


#########################
#### VOICES LIST API ####
#########################
# Define the new endpoint
@app.get("/api/voices")
async def get_voices():
wav_files = list_files(this_dir / "voices")
return {"voices": wav_files}

###########################
#### PREVIEW VOICE API ####
###########################
@app.post("/api/previewvoice/", response_class=JSONResponse)
async def preview_voice(request: Request, voice: str = Form(...)):
try:
# Hardcoded settings
language = "en"
output_file_name = "api_preview_voice"

# Clean the voice filename for inclusion in the text
clean_voice_filename = re.sub(r'\.wav$', '', voice.replace(' ', '_'))
clean_voice_filename = re.sub(r'[^a-zA-Z0-9]', ' ', clean_voice_filename)

# Generate the audio
text = f"Hello, this is a preview of voice {clean_voice_filename}."

# Generate the audio
output_file_path = this_dir / "outputs" / f"{output_file_name}.wav"
await generate_audio(text, voice, language, output_file_path)

# Generate the URL
output_file_url = f'http://{params["ip_address"]}:{params["port_number"]}/audio/{output_file_name}.wav'

# Return the response with both local file path and URL
return JSONResponse(
content={
"status": "generate-success",
"output_file_path": str(output_file_path),
"output_file_url": str(output_file_url),
},
status_code=200,
)
except Exception as e:
print(f"An error occurred: {e}")
return JSONResponse(content={"error": "An error occurred"}, status_code=500)

########################
#### GENERATION API ####
########################
Expand All @@ -627,9 +674,29 @@ async def get_audio(filename: str):
import uuid
import numpy as np
import soundfile as sf
import sys

# Check for PortAudio library on Linux
try:
import sounddevice as sd
sounddevice_installed=True
except OSError:
print(f"[{params['branding']}Startup] \033[91mInfo\033[0m PortAudio library not found. If you wish to play TTS in standalone mode through the API suite")
print(f"[{params['branding']}Startup] \033[91mInfo\033[0m please install PortAudio. This will not affect any other features or use of Alltalk.")
print(f"[{params['branding']}Startup] \033[91mInfo\033[0m If you don't know what the API suite is, then this message is nothing to worry about.")
sounddevice_installed=False
if sys.platform.startswith('linux'):
print(f"[{params['branding']}Startup] \033[91mInfo\033[0m On Linux, you can use the following command to install PortAudio:")
print(f"[{params['branding']}Startup] \033[91mInfo\033[0m sudo apt-get install portaudio19-dev")

from typing import Union, Dict
from pydantic import BaseModel, ValidationError, Field

def play_audio(file_path, volume):
data, fs = sf.read(file_path)
sd.play(volume * data, fs)
sd.wait()

class Request(BaseModel):
# Define the structure of the 'Request' class if needed
pass
Expand Down Expand Up @@ -827,6 +894,10 @@ async def tts_generate(
else:
cleaned_string = text_input
await generate_audio(cleaned_string, character_voice_gen, language, output_file_path)
if sounddevice_installed == False:
autoplay = False
if autoplay:
play_audio(output_file_path, autoplay_volume)
return JSONResponse(content={"status": "generate-success", "output_file_path": str(output_file_path), "output_file_url": str(output_file_url)}, status_code=200)
except Exception as e:
return JSONResponse(content={"status": "generate-failure", "error": "An error occurred"}, status_code=500)
Expand Down Expand Up @@ -1673,6 +1744,33 @@ async def tts_generate(
<p style="padding-left: 30px;"><span style="color: #3366ff;">curl -X POST "http://127.0.0.1:7851/api/lowvramsetting?new_low_vram_value=True"</span></p>
<p style="padding-left: 30px;">Replace True with False to disable Low VRAM mode.</p>
<p style="padding-left: 30px;">JSON return <span style="color: #339966;">{"status": "lowvram-success"}</span></p>
<h4>Ready Endpoint</strong></h4>
<p>Check if the Text-to-Speech (TTS) service is ready to accept requests.</p>
<ul>
<li>URL: <span style="color: #3366ff;">http://127.0.0.1:7851/api/ready</span></li>
<li>Method: <span style="color: #3366ff;">GET</span></li>
<li>Response: <span style="color: #339966;">Ready</span></li>
</ul>
<p style="padding-left: 30px;"><span style="color: #3366ff;">curl -X GET "http://127.0.0.1:7851/api/ready"</span></p>
<h4>Voices List Endpoint</strong></h4>
<p>Retrieve a list of available voices for generating speech.</p>
<ul>
<li>URL: <span style="color: #3366ff;">http://127.0.0.1:7851/api/voices</span></li>
<li>Method: <span style="color: #3366ff;">GET</span></li>
</ul>
<p style="padding-left: 30px;"><span style="color: #3366ff;">curl -X GET "http://127.0.0.1:7851/api/voices"</span></p>
<p style="padding-left: 30px;">JSON return: <span style="color: #339966;">{"voices": ["voice1.wav", "voice2.wav", "voice3.wav"]}</span></p>
<h4><strong>Preview Voice Endpoint</strong></h4>
<p>Generate a preview of a specified voice with hardcoded settings.</p>
<ul>
<li>URL: <span style="color: #3366ff;">http://127.0.0.1:7851/api/previewvoice/</span></li>
<li>Method: <span style="color: #3366ff;">POST</span></li>
<li>Content-Type: <span style="color: #3366ff;">application/x-www-form-urlencoded</span></li>
</ul>
<p style="padding-left: 30px;"><span style="color: #3366ff;">curl -X POST "http://127.0.0.1:7851/api/previewvoice/" -F "voice=female_01.wav"</span></p>
<p style="padding-left: 30px;">Replace <span style="color: #3366ff;">female_01.wav</span> with the name of the voice sample you want to hear.</p>
<p style="padding-left: 30px;">JSON return: <span style="color: #339966;">{"status": "generate-success", "output_file_path": "/path/to/outputs/api_preview_voice.wav", "output_file_url": "http://127.0.0.1:7851/audio/api_preview_voice.wav"}</span></p>
<p><a href="#toc">Back to top of page<br /></a></p>
<h2 id="debugging-and-tts-generation-information"><strong>Debugging and TTS Generation Information</strong></h2>
Expand Down Expand Up @@ -1717,10 +1815,20 @@ async def tts_generate(
# Render the template with the dynamic values
rendered_html = template.render(params=params)

###############################
#### Internal script ready ####
###############################
@app.get("/ready")
async def ready():
return Response("Ready endpoint")

############################
#### External API ready ####
############################
@app.get("/api/ready")
async def ready():
return Response("Ready")

@app.get("/")
async def read_root():
return HTMLResponse(content=rendered_html, status_code=200)
Expand Down

0 comments on commit fa82870

Please sign in to comment.