Skip to content
This repository has been archived by the owner on Jan 30, 2024. It is now read-only.

added "📦 Download Subtitle" button | added ngrok support for public link | add more arguments in "⚙️Setting" #21

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ data/
__pycache__/
*.py[cod]
*$py.class
.ipynb_checkpoints/
*/.ipynb_checkpoints/
*/*/.ipynb_checkpoints/

# C extensions
*.so
Expand Down Expand Up @@ -219,3 +222,7 @@ tags
.idea/

.pytest_cache/
*_my_*

experiments
*/*_exp*
29 changes: 0 additions & 29 deletions CHANGELOG.md

This file was deleted.

75 changes: 66 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,90 @@
# Streamlit UI for OpenAI's Whisper
# Sutitle generation using OpenAI's Whisper

This is a simple [Streamlit UI](https://streamlit.io/) for [OpenAI's Whisper speech-to-text model](https://openai.com/blog/whisper/).
It let's you download and transcribe media from YouTube videos, playlists, or local files.
It let's you download and transcribe media from YouTube videos, playlists, or local files with specific settings.
You can then browse, filter, and search through your saved audio files.
Feel free to raise an issue for bugs or feature requests or send a PR.

https://user-images.githubusercontent.com/6735526/216852681-53b6c3db-3e74-4c86-806f-6f6774a9003a.mp4
這是一個簡單的 [Streamlit UI](https://streamlit.io/) ,用於 [OpenAI的Whisper](https://openai.com/blog/whisper/) 語音轉文字模型。
它允許您從YouTube視頻、播放列表或本地文件下載和轉錄媒體 (一個檔案限制為200MB)。
然後,您可以瀏覽、過濾和搜索您保存的音頻文件。隨時歡迎提出錯誤或功能要求,或發送 PR。

## Watch demo @ Youtube:
[<img src='https://user-images.githubusercontent.com/15317938/220814880-7e8abb6e-36d9-41ac-8821-533a24bf7de3.png' width=320>](https://youtu.be/nJi1swi8y4I "Whisper Subtitle")


## Setup
This was built & tested on Python 3.11 but should also work on Python 3.9+ as with the original [Whisper repo](https://github.com/openai/whisper)).
This was built & tested on Python 3.11 but should also work on Python 3.8+ as with the original [Whisper repo](https://github.com/openai/whisper)).
You'll need to install `ffmpeg` on your system. Then, install the requirements with `pip`.

```
sudo apt install ffmpeg
# Install pytorch if you don't have it
# sudo conda install pytorch
pip install -r requirements.txt
pip install git+https://github.com/openai/whisper.git
```

## Usage

Once you're set up, you can run the app with:
1. Once you're set up, you can run the app with:

```
streamlit run app/01_🏠_Home.py
```

This will open a new tab in your browser with the app. You can then select a YouTube URL or local file & click "Run Whisper" to run the model on the selected media.

If the tab doesn't open, please use the URL: ```https://localhost:8501``` in your browser.

2. If you are not satisfied with the output, click on '⚙️Settings' on the left, then you can fine-tune the inference of Whisper model.

Important ⚙️Settings F.Y.I :
- Model: the model branch you want to use, defult: ```medium```
- language: the language of transcription, default: ```zh``` (中文)
- No Speech Threshold: how strictly we are in excluding non-speech detection, default ```0.4``` (lower level are more strict)
- Condition on previous text: whether the model will be affected by last text, default: ```True```


## Hosting

If you want to host it on a server with dynamic IP, you can install ```ngrok``` for forwarding your IP out.
So you can access it anywhere via a random url like: ```https://b9f1-458-19-17-41.jp.ngrok.io```

1. Register for an account and get your own token from ngrok website: https://dashboard.ngrok.com/get-started/your-authtoken
2. Install NGROK
```
wget https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz
sudo tar xvzf ngrok-v3-stable-linux-amd64.tgz -C /usr/local/bin
rm ngrok-v3-stable-linux-amd64.tgz
```
4. Put your own ngrok token from [ngrok website](https://dashboard.ngrok.com/get-started/your-authtoken) to ```forward_port.sh```
5. Expose your url to the public with ```bash forward_port.sh```
6. Inspect the random url by ```python inspect_url.py ``` and use the url in your browser

🚧 Under Construction:
1. Import redis for task queue

🔥You can try our demo [here](https://whispersubtitle.aiacademy.tw)

Special thanks to [<img src=https://i.imgur.com/bTHUPca.png width=300>](https://en.aiacademy.tw/) for the server.

## Changelog
All notable changes to this project alongside potential feature roadmap will be documented [in this file](CHANGELOG.md).
See [Commits](https://github.com/ShuYuHuang/whisper-subtitle/commits) for detailed changes.

Version summary will be provided in [Release](https://github.com/ShuYuHuang/whisper-subtitle/releases).

The changelog of the original vertion can be found [in this file](https://github.com/hayabhay/whisper-ui/blob/main/CHANGELOG.md).

## License
Whisper is licensed under [MIT](https://github.com/openai/whisper/blob/main/LICENSE) while Streamlit is licensed under [Apache 2.0](https://github.com/streamlit/streamlit/blob/develop/LICENSE).
Everything else is licensed under [MIT](https://github.com/hayabhay/whisper-ui/blob/main/LICENSE).
- Whisper: [MIT](https://github.com/openai/whisper/blob/main/LICENSE)
- Streamlit: [Apache 2.0](https://github.com/streamlit/streamlit/blob/develop/LICENSE).
- else: [MIT](https://github.com/hayabhay/whisper-ui/blob/main/LICENSE).

## Reference
I forked the original version of the interfaces form https://github.com/hayabhay/whisper-ui.

They actually did a great job for forming a manage systme of subtitles: search engine, transcript viewer, settings

The original version aims to demonstrate the power of Whisper, especially for short films in youtube for local use.

My goal is to provide a service for a bunch of clinets to make subtitles for long videos like meeting records, courses and movies.
21 changes: 18 additions & 3 deletions app/01_🏠_Home.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def get_formatted_date(date_str: str) -> str:
youtube_url = st.text_input("Youtube video or playlist URL")
elif source_type == "Upload":
input_files = st.file_uploader(
"Add one or more files", type=["mp4", "avi", "mov", "mkv", "mp3", "wav"], accept_multiple_files=True
"Add one or more files", type=["mp4", "avi", "mov", "mkv", "mp3", "wav","m4a"], accept_multiple_files=True
)

add_media = st.form_submit_button(label="Add Media!")
Expand All @@ -63,6 +63,7 @@ def get_formatted_date(date_str: str) -> str:
source = input_files
else:
st.error("Please upload files")


# Lowercase the source type
source_type = source_type.lower()
Expand Down Expand Up @@ -168,6 +169,13 @@ def get_formatted_date(date_str: str) -> str:
if st.button("🗑️ Delete", key=f"delete-{media['id']}"):
media_manager.delete(media["id"])
st.experimental_rerun()

filename=f'{Path(media["filepath"]).parent / "transcript"}.srt'

with open(filename, "rb") as file:
if st.download_button("📦 Download Subtitle", file, file_name=media["source_name"]+".srt"):
st.experimental_rerun()


with media_col:
# Render the media
Expand Down Expand Up @@ -229,7 +237,7 @@ def get_formatted_date(date_str: str) -> str:
media = media_manager.get_detail(media_id=st.session_state.selected_media)

# Render mini nav
back_col, del_col = st.sidebar.columns(2)
back_col, del_col, download_col = st.sidebar.columns(3)
with back_col:
# Add a button to show the list view
if st.button("◀️ &nbsp; Back to list", key="back-to-list-main"):
Expand All @@ -240,7 +248,14 @@ def get_formatted_date(date_str: str) -> str:
media_manager.delete(media["id"])
st.session_state.list_mode = True
st.experimental_rerun()


with download_col:
filename=f'{Path(media["filepath"]).parent / "transcript"}.srt'

with open(filename, "rb") as file:
if st.download_button("📦 Download Subtitle", file, file_name=media["source_name"]+".srt"):
st.experimental_rerun()

st.sidebar.write(f"""### {media["source_name"]}""")

# Render the media. Use both audio & video for youtube
Expand Down
23 changes: 13 additions & 10 deletions app/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,17 @@
# --------------
# Default settings
WHISPER_DEFAULT_SETTINGS = {
"whisper_model": "base",
"whisper_model": "medium",
"temperature": 0.0,
"temperature_increment_on_fallback": 0.2,
"no_speech_threshold": 0.6,
"no_speech_threshold": 0.4,
"logprob_threshold": -1.0,
"compression_ratio_threshold": 2.4,
"condition_on_previous_text": True,
"verbose": False,
"language": 'zh',
"fp16": True,
"without_timestamps" : False
}
WHISPER_SETTINGS_FILE = DATA_DIR / ".whisper_settings.json"

Expand All @@ -52,22 +55,22 @@ def get_whisper_settings():
# Common page configurations
# --------------------------
ABOUT = """
### Whisper UI
### 💬 Whisper Subtitle

This is a simple wrapper around Whisper to save, browse & search through transcripts.
This is a simple wrapper around Whisper to save, browse & search through transcripts for movie subtitles.

Please report any bugs or issues on [Github](https://github.com/hayabhay/whisper-ui/). Thanks!
Please report any bugs or issues on [Github](https://github.com/ShuYuHuang/whisper-subtitle/). Thanks!
"""


def get_page_config(page_title_prefix="", layout="wide"):
def get_page_config(page_title_prefix="💬", layout="wide"):
return {
"page_title": f"{page_title_prefix}Whisper UI",
"page_icon": "🤖",
"page_title": f"{page_title_prefix}Whisper Subtitle",
"page_icon": ":movie_camera:",
"layout": layout,
"menu_items": {
"Get Help": "https://twitter.com/hayabhay",
"Report a bug": "https://github.com/hayabhay/whisper-ui/issues",
"Get Help": "https://github.com/ShuYuHuang",
"Report a bug": "https://github.com/ShuYuHuang/whisper-subtitle/issues",
"About": ABOUT,
},
}
79 changes: 68 additions & 11 deletions app/core.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,60 @@
"""Thin wrapper class to manage Media objects."""
import shutil
import time
from datetime import datetime, timedelta
from functools import lru_cache
from pathlib import Path
from typing import Any, List, Union

import ffmpeg
import numpy as np

import torch
import whisper
from config import MEDIA_DIR
from db import ENGINE, Media, Segment, Transcript
from pytube import Playlist, YouTube
from sqlalchemy.orm import Session

N_GPUS = torch.cuda.device_count()


def ratio(x):
return (x[0])/x[1]
def check_gpu_ok(i):
return ratio(torch.cuda.mem_get_info(i)) > 0.5
def return_gpu():
for i in range(N_GPUS):
if check_gpu_ok(i):
return f'cuda:{i}'
break
elif i == N_GPUS-1:
print('!!!No space left!!!')
return f'cpu'

N_MODELS = 10
model_ok = []
for i in range(N_MODELS):
# initial state flag
model_ok.append(True)



# Whisper transcription functions
# ----------------
@lru_cache(maxsize=1)
def get_whisper_model(whisper_model: str):
@lru_cache(maxsize=10)
def get_whisper_model(whisper_model: str, model_id: int):
"""Get a whisper model from the cache or download it if it doesn't exist"""
model = whisper.load_model(whisper_model)
model_ok[model_id] = False
dev1 = return_gpu()
model = whisper.load_model(whisper_model, device="cpu")

model.encoder.to(dev1)
dev2 = return_gpu()
model.decoder.to(dev2)

model.decoder.register_forward_pre_hook(lambda _, inputs: tuple([inputs[0].to(dev2), inputs[1].to(dev2)] + list(inputs[2:])))
model.decoder.register_forward_hook(lambda _, inputs, outputs: outputs.to(dev1))
return model


Expand All @@ -39,7 +74,16 @@ def _transcribe(self, audio_path: str, whisper_model: str, **whisper_args):

# Get whisper model
# NOTE: If mulitple models are selected, this may keep all of them in memory depending on the cache size
transcriber = get_whisper_model(whisper_model)
ok_flag = False
while not ok_flag:
for model_id in range(N_MODELS):
if model_ok[model_id]:
transcriber = get_whisper_model(whisper_model, model_id)
ok_flag = True
break
if not ok_flag:
time.sleep(5)
print('All models are busy')

# Set configs & transcribe
if whisper_args["temperature_increment_on_fallback"] is not None:
Expand All @@ -50,11 +94,18 @@ def _transcribe(self, audio_path: str, whisper_model: str, **whisper_args):
whisper_args["temperature"] = [whisper_args["temperature"]]

del whisper_args["temperature_increment_on_fallback"]

transcript = transcriber.transcribe(
audio_path,
**whisper_args,
)

ok_flag = False
while not ok_flag:
try:
transcript = transcriber.transcribe(
audio_path,
**whisper_args,
)
model_ok[model_id] = True
ok_flag = True
except:
ok_flag = False

return transcript

Expand All @@ -65,8 +116,14 @@ def _transcribe_and_save(self, media_obj: Media, whisper_model: str, **whisper_a

# Write transcripts into the same directory as the audio file
audio_dir = Path(media_obj.filepath).parent
writer = whisper.utils.get_writer("all", audio_dir)
writer(transcript, "transcript")
writer = whisper.utils.get_writer("srt", audio_dir)
writer(transcript,
audio_path="transcript",
options={
"max_line_width": None,
"max_line_count": None,
"highlight_words": None
})

# Add transcript to the database
self.session.add(
Expand Down
6 changes: 6 additions & 0 deletions app/pages/02_⚙️_Settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@
condition_on_previous_text = st.checkbox(
"Condition on previous text", value=st.session_state.whisper_params["condition_on_previous_text"]
)
language_options = ['en', 'zh', 'de', 'es', 'ru', 'ko', 'fr', 'ja', 'pt', 'tr', 'pl', 'ca', 'nl', 'ar', 'sv', 'it', 'id', 'hi', 'fi', 'vi', 'he', 'uk', 'el', 'ms', 'cs', 'ro', 'da', 'hu', 'ta', 'no', 'th', 'ur', 'hr', 'bg', 'lt', 'la', 'mi', 'ml', 'cy', 'sk', 'te', 'fa', 'lv', 'bn', 'sr', 'az', 'sl', 'kn', 'et', 'mk', 'br', 'eu', 'is', 'hy', 'ne', 'mn', 'bs', 'kk', 'sq', 'sw', 'gl', 'mr', 'pa', 'si', 'km', 'sn', 'yo', 'so', 'af', 'oc', 'ka', 'be', 'tg', 'sd', 'gu', 'am', 'yi', 'lo', 'uz', 'fo', 'ht', 'ps', 'tk', 'nn', 'mt', 'sa', 'lb', 'my', 'bo', 'tl', 'mg', 'as', 'tt', 'haw', 'ln', 'ha', 'ba', 'jw', 'su']
selected_language = language_options.index(st.session_state.whisper_params["language"])
language = st.selectbox(
"Language", options=language_options, index=selected_language
)
verbose = st.checkbox("Verbose", value=st.session_state.whisper_params["verbose"])

save_settings = st.form_submit_button(label="💾 Save settings")
Expand All @@ -68,6 +73,7 @@
"compression_ratio_threshold": compression_ratio_threshold,
"condition_on_previous_text": condition_on_previous_text,
"verbose": verbose,
"language": language
}
# Commit to session & disk
st.session_state.whisper_params = updated_whisper_settings
Expand Down
Loading