hayabhay · ShuYuHuang · Feb 8, 2023 · Feb 8, 2023 · Feb 9, 2023 · Feb 9, 2023
diff --git a/.gitignore b/.gitignore
@@ -7,6 +7,9 @@ data/
 __pycache__/
 *.py[cod]
 *$py.class
+.ipynb_checkpoints/
+*/.ipynb_checkpoints/
+*/*/.ipynb_checkpoints/
 
 # C extensions
 *.so
@@ -219,3 +222,7 @@ tags
 .idea/
 
 .pytest_cache/
+*_my_*
+
+experiments
+*/*_exp*
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/README.md b/README.md
@@ -1,33 +1,90 @@
-# Streamlit UI for OpenAI's Whisper
+# Sutitle generation using OpenAI's Whisper
 
 This is a simple [Streamlit UI](https://streamlit.io/) for [OpenAI's Whisper speech-to-text model](https://openai.com/blog/whisper/).
-It let's you download and transcribe media from YouTube videos, playlists, or local files.
+It let's you download and transcribe media from YouTube videos, playlists, or local files with specific settings.
 You can then browse, filter, and search through your saved audio files.
 Feel free to raise an issue for bugs or feature requests or send a PR.
 
-https://user-images.githubusercontent.com/6735526/216852681-53b6c3db-3e74-4c86-806f-6f6774a9003a.mp4
+這是一個簡單的 [Streamlit UI](https://streamlit.io/) ，用於  [OpenAI的Whisper](https://openai.com/blog/whisper/) 語音轉文字模型。
+它允許您從YouTube視頻、播放列表或本地文件下載和轉錄媒體 (一個檔案限制為200MB)。
+然後，您可以瀏覽、過濾和搜索您保存的音頻文件。隨時歡迎提出錯誤或功能要求，或發送 PR。
+
+## Watch demo @ Youtube:
+[<img src='https://user-images.githubusercontent.com/15317938/220814880-7e8abb6e-36d9-41ac-8821-533a24bf7de3.png' width=320>](https://youtu.be/nJi1swi8y4I "Whisper Subtitle")
+
 
 ## Setup
-This was built & tested on Python 3.11 but should also work on Python 3.9+ as with the original [Whisper repo](https://github.com/openai/whisper)).
+This was built & tested on Python 3.11 but should also work on Python 3.8+ as with the original [Whisper repo](https://github.com/openai/whisper)).
 You'll need to install `ffmpeg` on your system. Then, install the requirements with `pip`.
 
 ```
-sudo apt install ffmpeg
+# Install pytorch if you don't have it
+# sudo conda install pytorch
 pip install -r requirements.txt
+pip install git+https://github.com/openai/whisper.git
 ```
+
 ## Usage
 
-Once you're set up, you can run the app with:
+1. Once you're set up, you can run the app with:
 
 ```
 streamlit run app/01_🏠_Home.py
 ```
 
 This will open a new tab in your browser with the app. You can then select a YouTube URL or local file & click "Run Whisper" to run the model on the selected media.
 
+If the tab doesn't open, please use the URL: ```https://localhost:8501``` in your browser.
+
+2. If you are not satisfied with the output, click on '⚙️Settings' on the left, then you can fine-tune the inference of Whisper model.
+
+Important ⚙️Settings F.Y.I :
+- Model: the model branch you want to use, defult: ```medium```
+- language: the language of transcription, default: ```zh``` (中文)
+- No Speech Threshold: how strictly we are in excluding non-speech detection, default ```0.4``` (lower level are more strict)
+- Condition on previous text: whether the model will be affected by last text, default: ```True```
+
+
+## Hosting
+
+If you want to host it on a server with dynamic IP, you can install ```ngrok``` for forwarding your IP out. 
+So you can access it anywhere via a random url like: ```https://b9f1-458-19-17-41.jp.ngrok.io``` 
+
+1. Register for an account and get your own token from ngrok website: https://dashboard.ngrok.com/get-started/your-authtoken
+2. Install NGROK
+```
+wget https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz
+sudo tar xvzf ngrok-v3-stable-linux-amd64.tgz -C /usr/local/bin
+rm ngrok-v3-stable-linux-amd64.tgz
+```
+4. Put your own ngrok token from [ngrok website](https://dashboard.ngrok.com/get-started/your-authtoken) to  ```forward_port.sh```
+5. Expose your url to the public with ```bash forward_port.sh```
+6. Inspect the random url by ```python inspect_url.py ``` and use the url in your browser
+
+🚧 Under Construction:
+1. Import redis for task queue
+
+🔥You can try our demo [here](https://whispersubtitle.aiacademy.tw)
+
+Special thanks to [<img src=https://i.imgur.com/bTHUPca.png width=300>](https://en.aiacademy.tw/) for the server.
+
 ## Changelog
-All notable changes to this project alongside potential feature roadmap will be documented [in this file](CHANGELOG.md).
+See [Commits](https://github.com/ShuYuHuang/whisper-subtitle/commits) for detailed changes.
+
+Version summary will be provided in [Release](https://github.com/ShuYuHuang/whisper-subtitle/releases).
+
+The changelog of the original vertion can be found [in this file](https://github.com/hayabhay/whisper-ui/blob/main/CHANGELOG.md).
 
 ## License
-Whisper is licensed under [MIT](https://github.com/openai/whisper/blob/main/LICENSE) while Streamlit is licensed under [Apache 2.0](https://github.com/streamlit/streamlit/blob/develop/LICENSE).
-Everything else is licensed under [MIT](https://github.com/hayabhay/whisper-ui/blob/main/LICENSE).
+- Whisper: [MIT](https://github.com/openai/whisper/blob/main/LICENSE)
+- Streamlit: [Apache 2.0](https://github.com/streamlit/streamlit/blob/develop/LICENSE).
+- else: [MIT](https://github.com/hayabhay/whisper-ui/blob/main/LICENSE).
+
+## Reference
+I forked the original version of the interfaces form https://github.com/hayabhay/whisper-ui. 
+
+They actually did a great job for forming a manage systme of subtitles: search engine, transcript viewer, settings
+
+The original version aims to demonstrate the power of Whisper, especially for short films in youtube for local use.
+
+My goal is to provide a service for a bunch of clinets to make subtitles for long videos like meeting records, courses and movies.
diff --git a/app/01_🏠_Home.py b/app/01_🏠_Home.py
@@ -46,7 +46,7 @@ def get_formatted_date(date_str: str) -> str:
             youtube_url = st.text_input("Youtube video or playlist URL")
         elif source_type == "Upload":
             input_files = st.file_uploader(
-                "Add one or more files", type=["mp4", "avi", "mov", "mkv", "mp3", "wav"], accept_multiple_files=True
+                "Add one or more files", type=["mp4", "avi", "mov", "mkv", "mp3", "wav","m4a"], accept_multiple_files=True
             )
 
         add_media = st.form_submit_button(label="Add Media!")
@@ -63,6 +63,7 @@ def get_formatted_date(date_str: str) -> str:
                 source = input_files
             else:
                 st.error("Please upload files")
+
 
         # Lowercase the source type
         source_type = source_type.lower()
@@ -168,6 +169,13 @@ def get_formatted_date(date_str: str) -> str:
                 if st.button("🗑️ Delete", key=f"delete-{media['id']}"):
                     media_manager.delete(media["id"])
                     st.experimental_rerun()
+
+                filename=f'{Path(media["filepath"]).parent / "transcript"}.srt'
+
+                with open(filename, "rb") as file:
+                    if st.download_button("📦 Download Subtitle", file, file_name=media["source_name"]+".srt"):
+                        st.experimental_rerun()
+
 
             with media_col:
                 # Render the media
@@ -229,7 +237,7 @@ def get_formatted_date(date_str: str) -> str:
     media = media_manager.get_detail(media_id=st.session_state.selected_media)
 
     # Render mini nav
-    back_col, del_col = st.sidebar.columns(2)
+    back_col, del_col, download_col = st.sidebar.columns(3)
     with back_col:
         # Add a button to show the list view
         if st.button("◀️ &nbsp; Back to list", key="back-to-list-main"):
@@ -240,7 +248,14 @@ def get_formatted_date(date_str: str) -> str:
             media_manager.delete(media["id"])
             st.session_state.list_mode = True
             st.experimental_rerun()
-
+
+    with download_col:
+        filename=f'{Path(media["filepath"]).parent / "transcript"}.srt'
+
+        with open(filename, "rb") as file:
+            if st.download_button("📦 Download Subtitle", file, file_name=media["source_name"]+".srt"):
+                st.experimental_rerun()
+
     st.sidebar.write(f"""### {media["source_name"]}""")
 
     # Render the media. Use both audio & video for youtube

diff --git a/app/config.py b/app/config.py
@@ -21,14 +21,17 @@
 # --------------
 # Default settings
 WHISPER_DEFAULT_SETTINGS = {
-    "whisper_model": "base",
+    "whisper_model": "medium",
     "temperature": 0.0,
     "temperature_increment_on_fallback": 0.2,
-    "no_speech_threshold": 0.6,
+    "no_speech_threshold": 0.4,
     "logprob_threshold": -1.0,
     "compression_ratio_threshold": 2.4,
     "condition_on_previous_text": True,
     "verbose": False,
+    "language": 'zh',
+    "fp16": True,
+    "without_timestamps" : False
 }
 WHISPER_SETTINGS_FILE = DATA_DIR / ".whisper_settings.json"
 
@@ -52,22 +55,22 @@ def get_whisper_settings():
 # Common page configurations
 # --------------------------
 ABOUT = """
-### Whisper UI
+### 💬 Whisper Subtitle
 
-This is a simple wrapper around Whisper to save, browse & search through transcripts.
+This is a simple wrapper around Whisper to save, browse & search through transcripts for movie subtitles.
 
-Please report any bugs or issues on [Github](https://github.com/hayabhay/whisper-ui/). Thanks!
+Please report any bugs or issues on [Github](https://github.com/ShuYuHuang/whisper-subtitle/). Thanks!
 """
 
 
-def get_page_config(page_title_prefix="", layout="wide"):
+def get_page_config(page_title_prefix="💬", layout="wide"):
     return {
-        "page_title": f"{page_title_prefix}Whisper UI",
-        "page_icon": "🤖",
+        "page_title": f"{page_title_prefix}Whisper Subtitle",
+        "page_icon": ":movie_camera:",
         "layout": layout,
         "menu_items": {
-            "Get Help": "https://twitter.com/hayabhay",
-            "Report a bug": "https://github.com/hayabhay/whisper-ui/issues",
+            "Get Help": "https://github.com/ShuYuHuang",
+            "Report a bug": "https://github.com/ShuYuHuang/whisper-subtitle/issues",
             "About": ABOUT,
         },
     }
diff --git a/app/core.py b/app/core.py
@@ -1,25 +1,60 @@
 """Thin wrapper class to manage Media objects."""
 import shutil
+import time
 from datetime import datetime, timedelta
 from functools import lru_cache
 from pathlib import Path
 from typing import Any, List, Union
 
 import ffmpeg
 import numpy as np
+
+import torch
 import whisper
 from config import MEDIA_DIR
 from db import ENGINE, Media, Segment, Transcript
 from pytube import Playlist, YouTube
 from sqlalchemy.orm import Session
 
+N_GPUS = torch.cuda.device_count()
+
+
+def ratio(x):
+    return (x[0])/x[1]
+def check_gpu_ok(i):
+    return ratio(torch.cuda.mem_get_info(i)) > 0.5
+def return_gpu():
+    for i in range(N_GPUS):
+        if check_gpu_ok(i):
+            return f'cuda:{i}'
+            break
+        elif i == N_GPUS-1:
+            print('!!!No space left!!!')
+            return f'cpu'
+
+N_MODELS = 10
+model_ok = []
+for i in range(N_MODELS):
+    # initial state flag
+    model_ok.append(True)
+
+
 
 # Whisper transcription functions
 # ----------------
-@lru_cache(maxsize=1)
-def get_whisper_model(whisper_model: str):
+@lru_cache(maxsize=10)
+def get_whisper_model(whisper_model: str, model_id: int):
     """Get a whisper model from the cache or download it if it doesn't exist"""
-    model = whisper.load_model(whisper_model)
+    model_ok[model_id] = False
+    dev1 = return_gpu()
+    model = whisper.load_model(whisper_model, device="cpu")
+
+    model.encoder.to(dev1)
+    dev2 = return_gpu()
+    model.decoder.to(dev2)
+
+    model.decoder.register_forward_pre_hook(lambda _, inputs: tuple([inputs[0].to(dev2), inputs[1].to(dev2)] + list(inputs[2:])))
+    model.decoder.register_forward_hook(lambda _, inputs, outputs: outputs.to(dev1))
     return model
 
 
@@ -39,7 +74,16 @@ def _transcribe(self, audio_path: str, whisper_model: str, **whisper_args):
 
         # Get whisper model
         # NOTE: If mulitple models are selected, this may keep all of them in memory depending on the cache size
-        transcriber = get_whisper_model(whisper_model)
+        ok_flag = False
+        while not ok_flag:
+            for model_id in range(N_MODELS):
+                if model_ok[model_id]:
+                    transcriber = get_whisper_model(whisper_model, model_id)
+                    ok_flag = True
+                    break
+            if not ok_flag:
+                time.sleep(5)
+                print('All models are busy')
 
         # Set configs & transcribe
         if whisper_args["temperature_increment_on_fallback"] is not None:
@@ -50,11 +94,18 @@ def _transcribe(self, audio_path: str, whisper_model: str, **whisper_args):
             whisper_args["temperature"] = [whisper_args["temperature"]]
 
         del whisper_args["temperature_increment_on_fallback"]
-
-        transcript = transcriber.transcribe(
-            audio_path,
-            **whisper_args,
-        )
+
+        ok_flag = False
+        while not ok_flag:
+            try:
+                transcript = transcriber.transcribe(
+                    audio_path,
+                    **whisper_args,
+                )
+                model_ok[model_id] = True
+                ok_flag = True
+            except:
+                ok_flag = False
 
         return transcript
 
@@ -65,8 +116,14 @@ def _transcribe_and_save(self, media_obj: Media, whisper_model: str, **whisper_a
 
         # Write transcripts into the same directory as the audio file
         audio_dir = Path(media_obj.filepath).parent
-        writer = whisper.utils.get_writer("all", audio_dir)
-        writer(transcript, "transcript")
+        writer = whisper.utils.get_writer("srt", audio_dir)
+        writer(transcript,
+               audio_path="transcript",
+               options={
+                   "max_line_width": None,
+                   "max_line_count": None,
+                   "highlight_words": None
+               })
 
         # Add transcript to the database
         self.session.add(

diff --git a/app/pages/02_⚙️_Settings.py b/app/pages/02_⚙️_Settings.py
@@ -52,6 +52,11 @@
     condition_on_previous_text = st.checkbox(
         "Condition on previous text", value=st.session_state.whisper_params["condition_on_previous_text"]
     )
+    language_options = ['en', 'zh', 'de', 'es', 'ru', 'ko', 'fr', 'ja', 'pt', 'tr', 'pl', 'ca', 'nl', 'ar', 'sv', 'it', 'id', 'hi', 'fi', 'vi', 'he', 'uk', 'el', 'ms', 'cs', 'ro', 'da', 'hu', 'ta', 'no', 'th', 'ur', 'hr', 'bg', 'lt', 'la', 'mi', 'ml', 'cy', 'sk', 'te', 'fa', 'lv', 'bn', 'sr', 'az', 'sl', 'kn', 'et', 'mk', 'br', 'eu', 'is', 'hy', 'ne', 'mn', 'bs', 'kk', 'sq', 'sw', 'gl', 'mr', 'pa', 'si', 'km', 'sn', 'yo', 'so', 'af', 'oc', 'ka', 'be', 'tg', 'sd', 'gu', 'am', 'yi', 'lo', 'uz', 'fo', 'ht', 'ps', 'tk', 'nn', 'mt', 'sa', 'lb', 'my', 'bo', 'tl', 'mg', 'as', 'tt', 'haw', 'ln', 'ha', 'ba', 'jw', 'su']
+    selected_language = language_options.index(st.session_state.whisper_params["language"])
+    language = st.selectbox(
+        "Language", options=language_options, index=selected_language
+    )
     verbose = st.checkbox("Verbose", value=st.session_state.whisper_params["verbose"])
 
     save_settings = st.form_submit_button(label="💾 Save settings")
@@ -68,6 +73,7 @@
             "compression_ratio_threshold": compression_ratio_threshold,
             "condition_on_previous_text": condition_on_previous_text,
             "verbose": verbose,
+            "language": language
         }
         # Commit to session & disk
         st.session_state.whisper_params = updated_whisper_settings