Enhance your user experience by integrating speech-to-text capabilities into your application.
Explore the documentation
See demo
·
Report a Bug
·
Ask a feature
Table of Contents
Whisper, an advanced automatic speech recognition system developed by OpenAI, stands out due to its open-source nature, despite being a product of OpenAI. Built on 680,000 hours of diverse, multilingual, and multitask supervised data from the web, Whisper demonstrates exceptional accuracy in speech recognition across various accents, in the presence of background noise, and with technical language. It not only supports robust transcription in numerous languages but also offers translation of these transcriptions into English. To harness this potent technology for web application enhancement, we've developed a Flask API that encapsulates Whisper's features, facilitating easy integration and communication with your web applications.
This project was developed using several key technologies in the fields of artificial intelligence and web development:
To set up the project locally, follow these simple instructions.
- Install docker
brew install docker
- Clone the repo
git clone https://github.com/MP242/WHISPER-FLASK-API.git
- Docker - image
docker build -t whisper-api .
- Docker - run server
docker run -p 5000:5000 whisper-api
This Flask API offers two primary routes for easy interaction:
- GET Request to the Root Path
- Route: GET "/"
- Action: Returns a simple Hello World message.
fetch('http://localhost:5000/')
.then(response => response.text())
.then(data => console.log(data));
Response :
"Hello World"
- POST Request for Speech-to-Text Conversion
- Route: POST "/whisper"
- Input: Form data with an audio file included under the key "file".
- Action: Processes the provided audio file through the Whisper model to perform speech-to-text conversion.
// Assuming you have a File object or Blob representing the audio file
const audioFile = document.querySelector('input[type="file"]').files[0];
const formData = new FormData();
formData.append("file", audioFile, "audio.wav");
fetch('http://localhost:5000/whisper', {
method: "POST",
body: formData
})
.then(response => response.json())
.then(data => console.log(data.text));
Expected Response:
results:[{ "filename":"audio.wav","transcript": "The transcribed text from your audio file." }]
These routes enable straightforward interaction with the speech-to-text capabilities provided by the Whisper model through your Flask API. The examples demonstrate how to make requests using JavaScript, facilitating integration into web applications.
- to be defined
Marc POLLET - @Marc_linkedin - [email protected]
Project Link: https://github.com/MP242/vocal-chat