Trying to convert voice to text. Can you give me some sample scripts by using Langchain or LangGraph? #28748

mraguth · 2024-12-16T21:04:34Z

mraguth
Dec 16, 2024

In my POC, I have a need to convert voice to text. Can you give me some sample scripts by using Langchain and OpenAI API or LangGraph?

Thanks for your help.

@dosu · 2024-12-16T21:07:14Z

dosubot[bot]
bot Dec 16, 2024

To convert voice to text using LangChain and the OpenAI API, you can use the gpt-4o-audio-preview model. Here's a sample script demonstrating how to transcribe audio input:

import base64
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-audio-preview",
    temperature=0,
)

with open("audio_input.wav", "rb") as f:
    audio = f.read()
    audio_b64 = base64.b64encode(audio).decode()

output_message = llm.invoke(
    [
        (
            "human",
            [
                {"type": "text", "text": "Transcribe the following:"},
                {
                    "type": "input_audio",
                    "input_audio": {"data": audio_b64, "format": "wav"},
                },
            ],
        ),
    ]
)
print(output_message.content)

This script reads an audio file, encodes it in base64, and sends it to the OpenAI API for transcription using LangChain ^[1].

If you're interested in using LangChain with Azure AI Services, you can use the AzureAiServicesSpeechToTextTool class to transcribe audio to text. Additionally, LangChain integrates with NVIDIA Riva for Automatic Speech Recognition (ASR), which can also be used for voice-to-text conversion ^[2]^[3].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

2 replies

mraguth Dec 16, 2024
Author

@dosu I am getting this error.

BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'messages[0]'. Content blocks are expected to be either text or image_url type.", 'type': 'invalid_request_error', 'param': 'messages[0]', 'code': 'invalid_value'}}
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

dosubot[bot] bot Dec 16, 2024

Ensure that the content blocks in your message are formatted correctly as either text or image_url types. For text content, wrap it in a dictionary like this:

content = [{"type": "text", "text": "Your message here"}]

For image URL content, use:

content = [{"type": "image_url", "image_url": {"url": "http://example.com/image.png"}}]

Each content block must adhere to these formats to avoid the BadRequestError with error code 400 ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

mraguth · 2024-12-16T21:18:45Z

mraguth
Dec 16, 2024
Author

@dosu. Can you also give me similar way 'text to voice files' code please?

1 reply

dosubot[bot] bot Dec 16, 2024

To convert text to voice files using LangChain, you can use the AzureCogsText2SpeechTool, GoogleCloudTextToSpeechTool, or the NvidiaRivaTextToSpeechTool. Here are sample scripts for each tool:

Using AzureCogsText2SpeechTool

from langchain_community.tools.azure_cognitive_services.text2speech import AzureCogsText2SpeechTool

# Set up your Azure Cognitive Services credentials
azure_cogs_key = "your_azure_cogs_key"
azure_cogs_region = "your_azure_cogs_region"

# Initialize the Text2Speech tool
text2speech_tool = AzureCogsText2SpeechTool(
    azure_cogs_key=azure_cogs_key,
    azure_cogs_region=azure_cogs_region,
    speech_language="en-US"
)

# Convert text to speech
text = "Hello, this is a sample text to speech conversion."
speech_file = text2speech_tool._run(text)

print(f"Speech file saved at: {speech_file}")

Using GoogleCloudTextToSpeechTool

from langchain_community.tools.google_cloud.texttospeech import GoogleCloudTextToSpeechTool

# Initialize the Google Cloud TextToSpeech tool
text2speech_tool = GoogleCloudTextToSpeechTool()

# Convert text to speech
text = "Hello, this is a sample text to speech conversion."
speech_file = text2speech_tool._run(input_text=text, language_code="en-US")

print(f"Speech file saved at: {speech_file}")

Using NvidiaRivaTextToSpeechTool

from langchain_community.utilities.nvidia_riva import RivaTTS

# Initialize the Nvidia Riva TextToSpeech tool
riva_tts_tool = RivaTTS(
    url="http://localhost:50051",  # Replace with your Riva server URL
    language_code="en-US",
    voice_name="English-US.Female-1",
    output_directory="./output"  # Directory to save the audio files
)

# Convert text to speech
text = "Hello, this is a sample text to speech conversion."
audio_bytes = riva_tts_tool.invoke(text)

# Save the audio to a file
with open("output/output.wav", "wb") as f:
    f.write(audio_bytes)

print("Speech file saved as output/output.wav")

Make sure to replace "your_azure_cogs_key" and "your_azure_cogs_region" with your actual Azure Cognitive Services API key and region for the Azure tool. For the Google Cloud tool, ensure you have set up your Google Cloud credentials. For the Nvidia Riva tool, ensure you have a running Riva server and adjust the URL accordingly ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to convert voice to text. Can you give me some sample scripts by using Langchain or LangGraph? #28748

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Trying to convert voice to text. Can you give me some sample scripts by using Langchain or LangGraph? #28748

mraguth Dec 16, 2024

Replies: 2 comments · 3 replies

dosubot[bot] bot Dec 16, 2024

mraguth Dec 16, 2024 Author

dosubot[bot] bot Dec 16, 2024

mraguth Dec 16, 2024 Author

dosubot[bot] bot Dec 16, 2024

Using AzureCogsText2SpeechTool

Using GoogleCloudTextToSpeechTool

Using NvidiaRivaTextToSpeechTool

mraguth
Dec 16, 2024

Replies: 2 comments 3 replies

dosubot[bot]
bot Dec 16, 2024

mraguth Dec 16, 2024
Author

mraguth
Dec 16, 2024
Author