|
1 |
| -# Whisper Webservice |
| 1 | +# Whisper ASR Webservice |
2 | 2 |
|
3 | 3 | The webservice will be available soon.
|
4 | 4 |
|
5 | 5 | Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.
|
6 | 6 |
|
7 | 7 | ## Docker Setup
|
8 | 8 |
|
9 |
| -The docker image will be available soon |
10 |
| - |
11 |
| -## Setup |
12 |
| - |
13 |
| -We used Python 3.9.9 and [PyTorch](https://pytorch.org/) 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.7 or later and recent PyTorch versions. The codebase also depends on a few Python packages, most notably [HuggingFace Transformers](https://huggingface.co/docs/transformers/index) for their fast tokenizer implementation and [ffmpeg-python](https://github.com/kkroening/ffmpeg-python) for reading audio files. The following command will pull and install the latest commit from this repository, along with its Python dependencies |
14 |
| - |
15 |
| - pip install git+https://github.com/openai/whisper.git |
16 |
| - |
17 |
| -It also requires the command-line tool [`ffmpeg`](https://ffmpeg.org/) to be installed on your system, which is available from most package managers: |
18 |
| - |
19 |
| -```bash |
20 |
| -# on Ubuntu or Debian |
21 |
| -sudo apt update && sudo apt install ffmpeg |
22 |
| - |
23 |
| -# on MacOS using Homebrew (https://brew.sh/) |
24 |
| -brew install ffmpeg |
25 |
| - |
26 |
| -# on Windows using Chocolatey (https://chocolatey.org/) |
27 |
| -choco install ffmpeg |
28 |
| -``` |
29 |
| - |
30 |
| -## Command-line usage |
31 |
| - |
32 |
| -The following command will transcribe speech in audio files, using the `medium` model: |
33 |
| - |
34 |
| - whisper audio.flac audio.mp3 audio.wav --model medium |
35 |
| - |
36 |
| -The default setting (which selects the `small` model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the `--language` option: |
37 |
| - |
38 |
| - whisper japanese.wav --language Japanese |
39 |
| - |
40 |
| -Adding `--task translate` will translate the speech into English: |
41 |
| - |
42 |
| - whisper japanese.wav --language Japanese --task translate |
43 |
| - |
44 |
| -Run the following to view all available options: |
45 |
| - |
46 |
| - whisper --help |
47 |
| - |
48 |
| -See [tokenizer.py](whisper/tokenizer.py) for the list of all available languages. |
49 |
| - |
50 |
| - |
51 |
| -## Python usage |
52 |
| - |
53 |
| -Transcription can also be performed within Python: |
54 |
| - |
55 |
| -```python |
56 |
| -import whisper |
57 |
| - |
58 |
| -model = whisper.load_model("base") |
59 |
| -result = model.transcribe("audio.mp3") |
60 |
| -print(result["text"]) |
61 |
| -``` |
62 |
| - |
63 |
| -Internally, the `transcribe()` method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window. |
64 |
| - |
65 |
| -Below is an example usage of `whisper.detect_language()` and `whisper.decode()` which provide lower-level access to the model. |
66 |
| - |
67 |
| -```python |
68 |
| -import whisper |
69 |
| - |
70 |
| -model = whisper.load_model("base") |
71 |
| - |
72 |
| -# load audio and pad/trim it to fit 30 seconds |
73 |
| -audio = whisper.load_audio("audio.mp3") |
74 |
| -audio = whisper.pad_or_trim(audio) |
75 |
| - |
76 |
| -# make log-Mel spectrogram and move to the same device as the model |
77 |
| -mel = whisper.log_mel_spectrogram(audio).to(model.device) |
78 |
| - |
79 |
| -# detect the spoken language |
80 |
| -_, probs = model.detect_language(mel) |
81 |
| -print(f"Detected language: {max(probs, key=probs.get)}") |
82 |
| - |
83 |
| -# decode the audio |
84 |
| -options = whisper.DecodingOptions() |
85 |
| -result = whisper.decode(model, mel, options) |
86 |
| - |
87 |
| -# print the recognized text |
88 |
| -print(result.text) |
89 |
| -``` |
90 |
| - |
91 |
| -## License |
92 |
| - |
93 |
| -The code and the model weights of Whisper are released under the MIT License. See [LICENSE](LICENSE) for further details. |
| 9 | +The docker image will be available soon |
0 commit comments