Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement long audio inference for k2 #42

Open
JER-ry opened this issue Dec 6, 2024 · 1 comment
Open

Implement long audio inference for k2 #42

JER-ry opened this issue Dec 6, 2024 · 1 comment

Comments

@JER-ry
Copy link

JER-ry commented Dec 6, 2024

if duration > TOO_LONG_SECONDS:
warnings.warn(
f"Passing a long audio input ({duration:.1f}s) is not recommended, "
"because K2 will require a large amount of memory. "
"Read the upstream discussion for more details: "
"https://github.com/k2-fsa/icefall/issues/1680"
)

Since this is implemented for espnet, it's probably worth also copying that here.

while pos < len(audio.waveform):
samples = audio.waveform[pos:]
# If the audio data is very long, find out the longest
# non-speech region and perform decoding up to that point.
if len(samples) > window:
blank = find_blank(model, samples[:window])
mid = int((blank.start + blank.end) / 2)
samples = samples[:mid]
asr = model(np.pad(samples, PADDING, mode="constant"))[0][0]
fulltext += asr
for start, end, text in split_text(model, samples, asr):
segments.append(Segment(
start_seconds=((pos + start) / audio.samplerate),
end_seconds=((pos + end) / audio.samplerate),
text=text,
))

@fujimotos
Copy link
Member

Unfortunately, it's not that easy.

This particular function find_blank() deeply depends on ESPnet's API interface.
For this reason, we cannot just copy it to reazonspeech.k2.asr.

Since this is implemented for espnet, it's probably worth also copying that here

For now, we recommend to use an external VAD model (such as Silero VAD)
to process a long audio sequence with our K2 model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants