Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add asynchronous audio processing #2

Open
ManziBryan opened this issue May 8, 2023 · 1 comment
Open

Add asynchronous audio processing #2

ManziBryan opened this issue May 8, 2023 · 1 comment

Comments

@ManziBryan
Copy link

It would be really cool to process the audio in real time using something like this https://github.com/NVIDIA/NeMo/blob/stable/examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py

At the moment, even though the /transcribe endpoint is marked as async, it is completely blocked until the self.transcribe() call completes. It also only accepts an entire file, as opposed to a stream of audio bytes. This means we can only process an entire file at once, and probably means we need more resources to read the entire file into memory and transfer it to the model. If we could stream, say 10s at a time, we would probably be able to transcribe longer files, and if we did something like the cache aware ^^ thing up there, we might be able to do this without a significant degradation in performance?

Does this require changing the model to accept a collection of bytes instead of an entire file?

@ManziBryan
Copy link
Author

ManziBryan commented May 15, 2023

I think to do this, you will need to make two docker containers running on two different ports, one which does just the transcription, and one which processes api requests.

The API container would have something like this:
`
async def forward(ws_a: WebSocket, ws_b):
while True:
try:
data = await ws_a.receive_bytes()
except WebSocketDisconnect:
break
await ws_b.send_bytes(data)

async def reverse(ws_a: WebSocket, ws_b):
while True:
try:
response = await ws_b.receive()
data = response.data
except WebSocketDisconnect:
break
await ws_a.send_text(data)

@api.websocket("/send")
async def websocket_a(ws_a: WebSocket):
await ws_a.accept()
async with session.ws_connect(URL FOR WEBSOCKET RUNNING ON CONTAINER B}", timeout=60) as ws_b_client:
fwd_task = asyncio.create_task(forward(ws_a, ws_b_client))
rev_task = asyncio.create_task(reverse(ws_a, ws_b_client))
await asyncio.gather(fwd_task, rev_task)
`

and the transcribe container would have something like this

@app.websocket("/generate")
async def websocket_b(ws_b_server: WebSocket):
await ws_b_server.accept()
while True:
resp = DO TRANSCRIPTION STUFF
await ws_b_server.send_text(json.dumps(resp))

I think the hard part here will be figuring out how to send the bytes directly to the model without writing a bunch of temporary files.

Lmk if this doesn't make any sense or if you run into trouble, I've done something similar before and it worked well, but the repo is not yet open source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant