Skip to content

Commit

Permalink
updating transcribe methods in doc
Browse files Browse the repository at this point in the history
  • Loading branch information
Jiltseb committed Nov 1, 2024
1 parent bafb279 commit b631e14
Showing 1 changed file with 41 additions and 48 deletions.
89 changes: 41 additions & 48 deletions docs/pages/model_hub/asr.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,54 +70,47 @@ Here are some other possible configurations for the Whisper deployment:
)
```

### Examples of Transcription from Video

Let's see different transcribe methods in the transcription endpoint class.

!!! example "Transcribe methods in Aana SDK"

```python
from aana.core.models.video import VideoInput
from aana.core.models.whisper import BatchedWhisperParams, WhisperParams
from aana.deployments.whisper_deployment import WhisperOutput

async def run(
self,
video: VideoInput,
whisper_params: WhisperParams,
) -> WhisperOutput:

#Download video and extract audio
video_obj = await run_remote(download_video)(video_input=video)
audio = extract_audio(video=video_obj)
#1. Method "transcribe":
# Use to get the full transcription output at the end all at once.
transcription = await self.asr_handle.transcribe(
audio=audio, params=whisper_params
)
#further processing...


#2. Method "transcribe_stream":
# Use to get transcription segment-by-segment as they become available.
stream = handle.transcribe_stream(
audio=audio, params=WhisperParams
)
async for chunk in stream:
#further processing...


#3. Method "transcribe_in_chunks":
# Perform batched inference and returns one batch of segments at a time.
# 4x faster than sequential methods.
batched_stream = handle.transcribe_in_chunks(
audio=audio,
params=BatchedWhisperParams(),
)
async for chunk in batched_stream:
#further processing...
```
### Available Transcription Methods in Aana SDK

Below are the different transcription methods available in the Aana SDK:

1. **`transcribe` Method**
- **Description**: This method is used to get the complete transcription output at once after processing the entire audio.
- **Usage Example**:
```python
transcription = await self.asr_handle.transcribe(audio=audio, params=whisper_params)
# Further processing...
```

2. **`transcribe_stream` Method**
- **Description**: This method allows for segment-by-segment transcription as they become available.
- **Usage Example**:
```python
stream = handle.transcribe_stream(audio=audio, params=whisper_params)
async for chunk in stream:
# Further processing...
```

3. **`transcribe_in_chunks` Method**
- **Description**: This method performs batched inference, returning one batch of segments at a time. It is up to 4x faster than sequential methods.
- **Usage Example**:
```python
batched_stream = handle.transcribe_in_chunks(audio=audio, params=batched_whisper_params)
async for chunk in batched_stream:
# Further processing...
```

#### Differences Between `WhisperParams` and `BatchedWhisperParams`

Both `WhisperParams` and `BatchedWhisperParams` are used to configure the Whisper speech-to-text model in sequential and batched inferences respectively.

- **Common Parameters**:
Both classes share common attributes such as `language`, `beam_size`, `best_of`, and `temperature`.

- **Key Differences**:
WhisperParams includes additional attributes such as `word_timestamps` and `vad_filter`, which provide word-level timestamp extraction and voice activity detection filtering.

Refer to the respective [class documentation](../../reference/models/whisper.md) for detailed attributes and usage.

### Diarized ASR

Expand Down

0 comments on commit b631e14

Please sign in to comment.