Skip to content

Commit 364f7b9

Browse files
committed
Poetry init
1 parent ee634f0 commit 364f7b9

File tree

5 files changed

+743
-86
lines changed

5 files changed

+743
-86
lines changed

.gitignore

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
*.pyc
2+
3+
# Packages
4+
*.egg
5+
!/tests/**/*.egg
6+
/*.egg-info
7+
/dist/*
8+
build
9+
_build
10+
.cache
11+
*.so
12+
venv
13+
14+
# Installer logs
15+
pip-log.txt
16+
17+
# Unit test / coverage reports
18+
.coverage
19+
.pytest_cache
20+
21+
.DS_Store
22+
.idea/*
23+
.python-version
24+
.vscode/*
25+
26+
/test.py
27+
/test_*.*
28+
29+
/setup.cfg
30+
MANIFEST.in
31+
/setup.py
32+
/docs/site/*
33+
/tests/fixtures/simple_project/setup.py
34+
/tests/fixtures/project_with_extras/setup.py
35+
.mypy_cache
36+
37+
.venv
38+
/releases/*
39+
pip-wheel-metadata
40+
/poetry.toml
41+
42+
poetry/core/*

README.md

+2-86
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,9 @@
1-
# Whisper Webservice
1+
# Whisper ASR Webservice
22

33
The webservice will be available soon.
44

55
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.
66

77
## Docker Setup
88

9-
The docker image will be available soon
10-
11-
## Setup
12-
13-
We used Python 3.9.9 and [PyTorch](https://pytorch.org/) 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.7 or later and recent PyTorch versions. The codebase also depends on a few Python packages, most notably [HuggingFace Transformers](https://huggingface.co/docs/transformers/index) for their fast tokenizer implementation and [ffmpeg-python](https://github.com/kkroening/ffmpeg-python) for reading audio files. The following command will pull and install the latest commit from this repository, along with its Python dependencies
14-
15-
pip install git+https://github.com/openai/whisper.git
16-
17-
It also requires the command-line tool [`ffmpeg`](https://ffmpeg.org/) to be installed on your system, which is available from most package managers:
18-
19-
```bash
20-
# on Ubuntu or Debian
21-
sudo apt update && sudo apt install ffmpeg
22-
23-
# on MacOS using Homebrew (https://brew.sh/)
24-
brew install ffmpeg
25-
26-
# on Windows using Chocolatey (https://chocolatey.org/)
27-
choco install ffmpeg
28-
```
29-
30-
## Command-line usage
31-
32-
The following command will transcribe speech in audio files, using the `medium` model:
33-
34-
whisper audio.flac audio.mp3 audio.wav --model medium
35-
36-
The default setting (which selects the `small` model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the `--language` option:
37-
38-
whisper japanese.wav --language Japanese
39-
40-
Adding `--task translate` will translate the speech into English:
41-
42-
whisper japanese.wav --language Japanese --task translate
43-
44-
Run the following to view all available options:
45-
46-
whisper --help
47-
48-
See [tokenizer.py](whisper/tokenizer.py) for the list of all available languages.
49-
50-
51-
## Python usage
52-
53-
Transcription can also be performed within Python:
54-
55-
```python
56-
import whisper
57-
58-
model = whisper.load_model("base")
59-
result = model.transcribe("audio.mp3")
60-
print(result["text"])
61-
```
62-
63-
Internally, the `transcribe()` method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.
64-
65-
Below is an example usage of `whisper.detect_language()` and `whisper.decode()` which provide lower-level access to the model.
66-
67-
```python
68-
import whisper
69-
70-
model = whisper.load_model("base")
71-
72-
# load audio and pad/trim it to fit 30 seconds
73-
audio = whisper.load_audio("audio.mp3")
74-
audio = whisper.pad_or_trim(audio)
75-
76-
# make log-Mel spectrogram and move to the same device as the model
77-
mel = whisper.log_mel_spectrogram(audio).to(model.device)
78-
79-
# detect the spoken language
80-
_, probs = model.detect_language(mel)
81-
print(f"Detected language: {max(probs, key=probs.get)}")
82-
83-
# decode the audio
84-
options = whisper.DecodingOptions()
85-
result = whisper.decode(model, mel, options)
86-
87-
# print the recognized text
88-
print(result.text)
89-
```
90-
91-
## License
92-
93-
The code and the model weights of Whisper are released under the MIT License. See [LICENSE](LICENSE) for further details.
9+
The docker image will be available soon

0 commit comments

Comments
 (0)