Skip to content

Latest commit

 

History

History
90 lines (66 loc) · 3.29 KB

README.md

File metadata and controls

90 lines (66 loc) · 3.29 KB

Deploy to Baseten

Whisper Truss

Whisper is a speech-to-text model by OpenAI that transcribes audio in dozens of languages with remarkable accuracy. It is open-source under the MIT license and hosted on Baseten as a pre-trained model. Read the Whisper model card for more details.

Whisper's leap in transcription quality unlocks tons of compelling use cases, including:

  • Moderating audio content
  • Auditing call center logs
  • Automatically generating video subtitles
  • Improving podcast SEO with transcripts

Deploying Whisper

To deploy the Whisper Truss, you'll need to follow these steps:

  1. Prerequisites: Make sure you have a Baseten account and API key. You can sign up for a Baseten account here.

  2. Install Truss and the Baseten Python client: If you haven't already, install the Baseten Python client and Truss in your development environment using:

pip install --upgrade baseten truss
  1. Load the Whisper Truss: Assuming you've cloned this repo, spin up an IPython shell and load the Truss into memory:
import truss

whisper_truss = truss.load("path/to/whisper_truss")
  1. Log in to Baseten: Log in to your Baseten account using your API key (key found here):
import baseten

baseten.login("PASTE_API_KEY_HERE")
  1. Deploy the Whisper Truss: Deploy the Whisper Truss to Baseten with the following command:
baseten.deploy(whisper_truss)

Once your Truss is deployed, you can start using the Whisper model through the Baseten platform! Navigate to the Baseten UI to watch the model build and deploy and invoke it via the REST API.

Whisper API documentation

Input

This deployment of Whisper takes input as a JSON dictionary with the key url corresponding to a string of a URL pointing at an MP3 file. For example:

{
    "url": "https://cdn.baseten.co/docs/production/Gettysburg.mp3"
}

Output

The model returns a fairly lengthy dictionary. For most uses, you'll be interested in the key language which specifies the detected language of the audio and text which contains the full transcription.

{
    "language": "english",
    "segments": [
        {
        "start": 0,
        "end": 6.5200000000000005,
        "text": " Four score and seven years ago, our fathers brought forth upon this continent a new nation"
        },
        {
        "start": 6.52,
        "end": 21.6,
        "text": " conceived in liberty and dedicated to the proposition that all men are created equal."
        }
    ],
    "text": " Four score and seven years ago, our fathers brought forth upon this continent..."
}

Example usage

You can invoke your Whisper deployment via its REST API endpoint:

curl -X POST "https://app.baseten.co/models/{MODEL_ID}/predict" \
     -H "Content-Type: application/json" \
     -H 'Authorization: Api-Key {YOUR_API_KEY}' \
     -d '{"url": "https://cdn.baseten.co/docs/production/Gettysburg.mp3"}'