Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing timestamped-ASR with long audio recordings #246

Merged
merged 1 commit into from
Feb 1, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions src/senselab/audio/tasks/speech_to_text/huggingface.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ def transcribe_audios_with_transformers(
return_timestamps: Optional[str] = "word",
max_new_tokens: int = 128,
chunk_length_s: int = 30,
batch_size: int = 16,
batch_size: int = 1,
device: Optional[DeviceType] = None,
) -> List[ScriptLine]:
"""Transcribes all audio samples in the dataset.
Expand All @@ -86,7 +86,10 @@ def transcribe_audios_with_transformers(
return_timestamps (Optional[str]): The level of timestamp details (default is "word").
max_new_tokens (int): The maximum number of new tokens (default is 128).
chunk_length_s (int): The length of audio chunks in seconds (default is 30).
batch_size (int): The batch size for processing (default is 16).
batch_size (int): The batch size for processing (default is 1).
Note: Issues have been observed with long audio recordings and timestamped transcript
if the batch_size is high - not exactly clear what high means
(https://github.com/huggingface/transformers/issues/2615#issuecomment-656923205).
device (Optional[DeviceType]): The device to run the model on (default is None).

Returns:
Expand Down
Loading