Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

387 deploy whisper #388

Merged
merged 58 commits into from
Apr 4, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
a539494
add whisper start in docker-compose
arhihihipov Nov 9, 2023
d4a47ec
add env
arhihihipov Nov 23, 2023
5358360
update whisper deploy in docker-compose
arhihihipov Feb 18, 2024
23c579a
add test whisper
arhihihipov Feb 18, 2024
f9d5b9d
add deploy Whisper
arhihihipov Feb 24, 2024
142111c
update whisper test RTF
arhihihipov Feb 24, 2024
3bc3398
Vosk is gone
arhihihipov Feb 25, 2024
9e34833
fixes
arhihihipov Feb 26, 2024
6884eab
add logging
arhihihipov Feb 27, 2024
4301952
update github runner environment
HadronCollider Mar 7, 2024
0a46c88
add hot fix for chrome deb install
HadronCollider Mar 7, 2024
a5af166
update TestPredefenceEightToTenMinutesFeedbackEvaluator for new Prede…
HadronCollider Mar 7, 2024
15c8182
big rework of test_basic_training
HadronCollider Mar 7, 2024
edd5157
use driver.refresh() to reload page
HadronCollider Mar 7, 2024
57aeccd
Decreasing whisper model for github ci (medium->tiny)
HadronCollider Mar 7, 2024
6620f9c
Merge pull request #394 from OSLL/fix-github-ci
HadronCollider Mar 7, 2024
7895f4d
1st step of rework Docker (update wst_base image)
HadronCollider Mar 9, 2024
81763d7
update dockerhub repo and tag for wst_base image in build script
HadronCollider Mar 9, 2024
2b2af94
update requirements (WIP) and fix using updated libs
HadronCollider Mar 9, 2024
7444699
update Dockerfile (rework + new base image + labels) and docker-compo…
HadronCollider Mar 9, 2024
1fb986b
update docker image naming rules
HadronCollider Mar 9, 2024
d84fc9e
update workflow step (building base image)
HadronCollider Mar 9, 2024
7330454
update db_versioning (if no version == LAST_VERSION)
HadronCollider Mar 9, 2024
7ccda4c
move denoiser module from playground
HadronCollider Mar 9, 2024
2525f08
move denoiser module 2.0
HadronCollider Mar 9, 2024
0adc439
set up pythpn requirement versions
HadronCollider Mar 9, 2024
d9c2a9f
update dockerfiles
HadronCollider Mar 9, 2024
f6f707e
disabled wrong test whisper for GitHub workflow
HadronCollider Mar 9, 2024
0a53514
update sh scripts for building base image
HadronCollider Mar 9, 2024
9a92766
rename base image build script
HadronCollider Mar 9, 2024
95624bb
set ASR_MODEL=medium for deploy (before transfer to .env)
HadronCollider Mar 9, 2024
17c68ac
rm ports to whisper service
HadronCollider Mar 9, 2024
1dacb4d
Update test run in GitHub actions
HadronCollider Mar 9, 2024
5728e9c
Update main.yml Run tests
HadronCollider Mar 9, 2024
4bd9c00
update restart.sh (remove apache/ssl setup; update docker build command)
HadronCollider Mar 9, 2024
48fb3f9
add env
arhihihipov Mar 9, 2024
07ee0f6
Merge remote-tracking branch 'origin/whisper_deploy' into whisper_deploy
arhihihipov Mar 9, 2024
2f4cabf
restore vosk and add url param for Whisper
arhihihipov Mar 10, 2024
9740468
add .env to gitignore
arhihihipov Mar 12, 2024
496c079
Delete .env
arhihihipov Mar 12, 2024
034351a
add http errors handling and test whisper plug
arhihihipov Mar 12, 2024
3384b42
Update docker tag in workflow (main)
HadronCollider Mar 12, 2024
234fad6
add whisper cache to gitignore
arhihihipov Mar 14, 2024
45b70dd
merge whisper_deploy
HadronCollider Mar 20, 2024
7e0a4dd
fix comments. Add env variable Mem_Limit, volume. Also increase timeo…
arhihihipov Mar 26, 2024
5dd6866
remove port whisper
arhihihipov Mar 26, 2024
1a901b0
remove convert audio to wav and denoise for whisper
arhihihipov Mar 30, 2024
6af5a48
update test for whisper
arhihihipov Mar 30, 2024
d84ae55
Merge branch 'master' into whisper_deploy
arhihihipov Mar 30, 2024
25f5068
increase waitint time in test
arhihihipov Mar 30, 2024
5be8e9c
increase waitint time in test
arhihihipov Mar 30, 2024
90684a6
Merge branch 'whisper_deploy' of github.com:OSLL/web_speech_trainer i…
arhihihipov Mar 30, 2024
4e91e9f
Revert "increase waitint time in test"
arhihihipov Mar 30, 2024
c82a185
Merge branch 'whisper_deploy' of github.com:OSLL/web_speech_trainer i…
HadronCollider Mar 31, 2024
0dbb056
add local import for websockets (VoskAudioRecognizer)
HadronCollider Mar 31, 2024
fbec381
Merge branch 'master' into update_image
HadronCollider Mar 31, 2024
a67df4f
Merge pull request #399 from OSLL/update_image
HadronCollider Mar 31, 2024
cf032d2
remove excessive log
arhihihipov Apr 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on: pull_request

jobs:
build:
runs-on: ubuntu-18.04
runs-on: ubuntu-20.04

steps:
- uses: actions/checkout@v2
Expand All @@ -14,8 +14,11 @@ jobs:
# build base image
docker build -f Dockerfile_base -t osll/wst_base .

# build vosk
docker build -f "Dockerfile.kaldi-ru" -t osll/vosk .
- name: Decreasing whisper model for tests
run: |
cp docker-compose.yml docker-compose-tmp.yml
sed -e "s/ASR_MODEL=medium/ASR_MODEL=tiny/" docker-compose-tmp.yml > docker-compose.yml
rm docker-compose-tmp.yml

- name: Build docker-compose
run: |
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,6 @@ venv
.idea
ssl
__pycache__
/VERSION.json
/VERSION.json
.env
/whisper_asr_model_cache
15 changes: 0 additions & 15 deletions Dockerfile.kaldi-ru

This file was deleted.

2 changes: 1 addition & 1 deletion Dockerfile_base
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ RUN apt-get install -y libgconf2-4 libnss3 libxss1 python3-pip vim ffmpeg exift
WORKDIR /usr/local/bin
RUN wget https://chromedriver.storage.googleapis.com/90.0.4430.24/chromedriver_linux64.zip
RUN unzip chromedriver_linux64.zip
RUN wget http://dl.google.com/linux/chrome/deb/pool/main/g/google-chrome-stable/google-chrome-stable_90.0.4430.72-1_amd64.deb
RUN wget https://mirror.kraski.tv/soft/google_chrome/linux/90.0.4430.72/google-chrome-stable_90.0.4430.72-1_amd64.deb
RUN apt-get install -y ./google-chrome-stable_90.0.4430.72-1_amd64.deb
RUN pip3 install --upgrade pip==21.3.1
RUN pip3 install --upgrade setuptools
Expand Down
15 changes: 13 additions & 2 deletions app/audio_processor.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
import sys
import time
from datetime import datetime

import librosa
from bson import ObjectId

from app.audio_recognizer import AudioRecognizer, VoskAudioRecognizer
from app.audio_recognizer import AudioRecognizer, WhisperAudioRecognizer
from app.config import Config
from app.mongo_models import Trainings
from app.mongo_odm import DBManager, AudioToRecognizeDBManager, TrainingsDBManager, RecognizedAudioToProcessDBManager
Expand Down Expand Up @@ -52,7 +54,16 @@ def _try_extract_and_process(self):
self._hangle_error(training_id, verdict)
return
try:
audio_length = librosa.get_duration(filename=presentation_record_file)
logger.info(f'audio record length: {audio_length} s')

start_time = time.time()

recognized_audio = self._audio_recognizer.recognize(presentation_record_file)

end_time = time.time()
processing_time = end_time - start_time
logger.info(f'audio processing time: {processing_time} s')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Добавьте в сообщение длительность аудио (вы её получаете выше) - так будут более полезные логи (длина записи + время её обработки)

except Exception as e:
verdict = 'Recognition of a presentation record file with presentation_record_file_id = {} ' \
'has failed.\n{}'.format(presentation_record_file_id, e)
Expand Down Expand Up @@ -118,7 +129,7 @@ def run(self):

if __name__ == "__main__":
Config.init_config(sys.argv[1])
audio_recognizer = VoskAudioRecognizer(host=Config.c.vosk.url)
audio_recognizer = WhisperAudioRecognizer(url=Config.c.whisper.url)
audio_processor = AudioProcessor(audio_recognizer)
audio_processor.run()
stuck_audio_resender = StuckAudioResender()
Expand Down
55 changes: 55 additions & 0 deletions app/audio_recognizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,17 @@
import json
import wave

import requests
import websockets

from app import utils
from app.recognized_audio import RecognizedAudio
from app.recognized_word import RecognizedWord
from app.word import Word
from app.root_logger import get_root_logger
from playground.noise_reduction.denoiser import Denoiser

logger = get_root_logger(service_name='audio_processor')

class AudioRecognizer:
def recognize(self, audio):
Expand All @@ -25,6 +28,58 @@ def recognize(self, audio):
return RecognizedAudio(recognized_words)


class WhisperAudioRecognizer(AudioRecognizer):
def __init__(self, url):
self._url = url

def parse_recognizer_result(self, recognizer_result):
return RecognizedWord(
word=Word(recognizer_result['word']),
begin_timestamp=recognizer_result['start'],
end_timestamp=recognizer_result['end'],
probability=recognizer_result['probability'],
)

def recognize_wav(self, audio):
recognizer_results = self.send_audio_to_recognizer(audio.name)
recognized_words = list(map(self.parse_recognizer_result, recognizer_results))
return RecognizedAudio(recognized_words)

def recognize(self, audio):
temp_wav_file = utils.convert_from_mp3_to_wav(audio)
Denoiser.process_wav_to_wav(temp_wav_file, temp_wav_file, noise_length=3)
return self.recognize_wav(temp_wav_file)

def send_audio_to_recognizer(self, file_name, language='ru'):
params = {
'task': 'transcribe',
'language': language,
'word_timestamps': 'true',
'output': 'json'
}
headers = {'accept': 'application/json'}

audio_to_recognize = open(file_name, 'rb')
audio_to_recognize_buffer = audio_to_recognize.read()
audio_to_recognize.close()

try:
files = {'audio_file': (file_name, audio_to_recognize_buffer, 'audio/mpeg')}
response = requests.post(self._url, params=params, headers=headers, files=files)
response.raise_for_status()
except requests.exceptions.RequestException as e:
logger.info(f"Recognition error occurred while processing audio file: {e}")
return []

data = response.json()

recognizer_results = []
for segment in data["segments"]:
for recognized_word in segment["words"]:
recognizer_results.append(recognized_word)
return recognizer_results


class VoskAudioRecognizer(AudioRecognizer):
HadronCollider marked this conversation as resolved.
Show resolved Hide resolved
def __init__(self, host):
self._host = host
Expand Down
6 changes: 4 additions & 2 deletions app/feedback_evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,8 @@ def evaluate_feedback(self, criteria_results):

def get_result_as_sum_str(self, criteria_results):
if criteria_results is None or self.weights is None or \
criteria_results.get(StrictSpeechDurationCriterion.__name__, {}).get('result') == 0:
criteria_results.get("PredefenceStrictSpeechDurationCriterion", {}).get('result', 0) == 0 or \
criteria_results.get("DEFAULT_SPEECH_PACE_CRITERION", {}).get('result', 0) == 0:
return None
return super().get_result_as_sum_str(criteria_results)

Expand Down Expand Up @@ -171,7 +172,8 @@ def evaluate_feedback(self, criteria_results):

def get_result_as_sum_str(self, criteria_results):
if criteria_results is None or self.weights is None or \
criteria_results.get(StrictSpeechDurationCriterion.__name__, {}).get('result') == 0:
criteria_results.get("PredefenceStrictSpeechDurationCriterion", {}).get('result', 0) == 0 or \
criteria_results.get("DEFAULT_SPEECH_PACE_CRITERION", {}).get('result', 0) == 0:
return None
return super().get_result_as_sum_str(criteria_results)

Expand Down
3 changes: 3 additions & 0 deletions app_conf/config.ini
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ database_name=database
[vosk]
url=ws://vosk:2700

[whisper]
url=http://whisper:9000/asr

[user_agent_platform]
windows=True
linux=True
Expand Down
3 changes: 3 additions & 0 deletions app_conf/testing.ini
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ database_name=testing_database
[vosk]
url=ws://vosk:2700

[whisper]
url=http://whisper:9000/asr

[testing]
active=True
session_id=testing_session_id
Expand Down
19 changes: 12 additions & 7 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,14 @@ services:
- training_processor
volumes:
- ../database-dump:/app/dump/database-dump/

vosk:
image: "osll/vosk:v0.1"
restart: always
ports:
- 2700:2700

audio_processor:
image: base_image
command: python3 -m audio_processor $APP_CONF
restart: always
depends_on:
- db
- vosk
- whisper
- presentation_processor

recognized_audio_processor:
Expand Down Expand Up @@ -76,3 +70,14 @@ services:
- '--wiredTigerCacheSizeGB=2'
volumes:
- ../mongo_data:/data/db

whisper:
HadronCollider marked this conversation as resolved.
Show resolved Hide resolved
image: "onerahmet/openai-whisper-asr-webservice:v1.3.0"
environment:
- ASR_MODEL=${WHISPER_ASR_MODEL:-tiny}
- ASR_ENGINE=${WHISPER_ASR_ENGINE:-openai_whisper}
restart: always
cpuset: ${WHISPER_CPU:-0,1}
mem_limit: 5g
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Вынесите параметр по аналогии в env

volumes:
- ./whisper_asr_model_cache:/root/.cache/whisper
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Чтобы не засорять хост/папку репо (и не трогать docker/gitignore) - сделайте docker volume ( https://docs.docker.com/compose/compose-file/07-volumes/ )

3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,11 @@ vext
vext.gi
websockets
wheel
librosa
librosa == 0.9.2
noisereduce == 1.1.0
python_speech_features
pysndfx
python-i18n
python-pptx ==0.6.19
odfpy ==1.4.1
requests ==2.27.1
3 changes: 0 additions & 3 deletions scripts/build_system_image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,5 @@ set -e

tag=${1:-'v0.1'}

# build vosk
./scripts/build_image.sh "Dockerfile.kaldi-ru" osll/vosk:$tag

# build base image
./scripts/build_image.sh Dockerfile_base osll/wst_base:$tag
53 changes: 27 additions & 26 deletions tests/selenium/test_training.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.alert import Alert
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
Expand All @@ -20,10 +21,11 @@ def test_basic_training():
chrome_options.add_argument("--disable-user-media-security")
chrome_options.add_argument("--use-fake-ui-for-media-stream")
chrome_options.add_argument("--use-fake-device-for-media-stream")
chrome_options.add_argument("--use-fake-ui-for-media-stream")
chrome_options.add_argument('--use-file-for-fake-audio-capture={}/simple_phrases_russian.wav'.format(os.getcwd()))
chrome_options.add_experimental_option('detach', True)
driver = Chrome(options=chrome_options)
response = driver.request('POST', 'http://127.0.0.1:5000/lti', data={
driver.request('POST', 'http://127.0.0.1:5000/lti', data={
'lis_person_name_full': Config.c.testing.lis_person_name_full,
'ext_user_username': Config.c.testing.session_id,
'custom_task_id': Config.c.testing.custom_task_id,
Expand All @@ -37,33 +39,32 @@ def test_basic_training():
'oauth_consumer_key': Config.c.testing.oauth_consumer_key,
})
driver.get('http://127.0.0.1:5000/upload_presentation/')
driver.find_element_by_id('upload-presentation-form')
data = open('test_data/test_presentation_file_0.pdf', 'rb')
response = driver.request('POST', 'http://127.0.0.1:5000/handle_presentation_upload/',
files=dict(presentation=data))
pos = response.text.find("setupPresentationViewer(\"")
assert pos != -1
training_id = response.text[pos + 25: pos + 49]
driver.get('http://127.0.0.1:5000/trainings/{}/'.format(training_id))
driver.find_element_by_id('record').click()
step = 3
sleep(2 * step)
driver.find_element_by_id('next').click()
sleep(step)
driver.find_element_by_id('done').click()
sleep(step)
total_wait_time = 60
wait_time = 0
while wait_time < total_wait_time:
driver.get('http://127.0.0.1:5000/trainings/statistics/{}/'.format(training_id))
file_input = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "input[type=file]")))
file_input.send_keys(f'{os.getcwd()}/test_data/test_presentation_file_0.pdf')
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.ID, "button-submit"))).click()
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "record"))).click()
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "model-timer")))
WebDriverWait(driver, 10).until(EC.invisibility_of_element((By.ID, "model-timer")))
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "next")))
sleep(5)
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "next"))).click()
sleep(5)
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.ID, "done"))).click()
alert = Alert(driver)
alert.accept()

feedback_flag = False
step_count = 10
step = 10
for _ in range(step_count):
driver.refresh()
try:
feedback_element = WebDriverWait(driver, step).until(EC.presence_of_element_located((By.ID, 'feedback')))
if feedback_element.text.startswith('Оценка за тренировку'):
feedback_flag = True
break
else:
wait_time += step
sleep(step)
except TimeoutException:
wait_time += step
sleep(step)
except:
sleep(step)
driver.close()
assert wait_time < total_wait_time
assert feedback_flag, f"Проверка тренировки заняла более {step_count*step} секунд"
28 changes: 15 additions & 13 deletions tests/test_feedback_evaluator.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,29 @@
import pytest

from app.criteria import StrictSpeechDurationCriterion, SpeechPaceCriterion, FillersNumberCriterion
from app.feedback_evaluator import PredefenceEightToTenMinutesFeedbackEvaluator
from app.feedback_evaluator import PredefenceEightToTenMinutesNoSlideCheckFeedbackEvaluator


class TestPredefenceEightToTenMinutesFeedbackEvaluator:
class TestPredefenceEightToTenMinutesNoSlideCheckFeedbackEvaluator:
@pytest.mark.parametrize(
"criteria_results, expected_string",
[
({}, ''),
({StrictSpeechDurationCriterion.__name__: {'result': 0}}, None),
({StrictSpeechDurationCriterion.__name__: {'result': 0.5}}, '0.600 * 0.50'),
({}, None),
({"PredefenceStrictSpeechDurationCriterion": {'result': 0}}, None),
({"PredefenceStrictSpeechDurationCriterion": {'result': 0.5}}, None),
({"DEFAULT_SPEECH_PACE_CRITERION": {'result': 0.5}}, None),
({
StrictSpeechDurationCriterion.__name__: {'result': 0.5},
SpeechPaceCriterion.__name__: {'result': 0.7},
FillersNumberCriterion.__name__: {'result': 0.9},
"PredefenceStrictSpeechDurationCriterion": {'result': 0.5},
"DEFAULT_FILLERS_NUMBER_CRITERION": {'result': 0.9},
}, None),
({
"PredefenceStrictSpeechDurationCriterion": {'result': 0.5},
"DEFAULT_SPEECH_PACE_CRITERION": {'result': 0.7},
"DEFAULT_FILLERS_NUMBER_CRITERION": {'result': 0.9},
}, '0.600 * 0.50 + 0.200 * 0.70 + 0.200 * 0.90'),
({
StrictSpeechDurationCriterion.__name__: {'result': 0.5},
FillersNumberCriterion.__name__: {'result': 0.9},
}, '0.600 * 0.50 + 0.200 * 0.90'),
],
)
def test_get_result_as_sum_str(self, criteria_results, expected_string):
feedback_evaluator = PredefenceEightToTenMinutesFeedbackEvaluator()
feedback_evaluator = PredefenceEightToTenMinutesNoSlideCheckFeedbackEvaluator()
assert feedback_evaluator.get_result_as_sum_str(criteria_results) == expected_string

Loading
Loading