Improving timestamp accuracy in diarize.py by handling None #32
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Dear Maintainers,
I am submitting this pull request as a proposed solution to issue #29. I discovered a few potential edge cases in the
diarize.py
script where end timestamps may beNone
, subsequently causing errors and misalignments between diarizer and ASR timestamps. Besides, I noticed the alignment not being always precise as it wasn't taking into account the total duration of the inputs.Here's a concise overview of the modifications I have made:
Handling of None End Timestamp: Introduced a safety check to ensure that if the last end timestamp from the ASR output is
None
, it gets replaced by the total duration of the inputs. This alteration works as a safety net to avoid potential errors if for any reason, the ASR fails to provide an end timestamp for the last chunk.Alignment Condition: Implemented a conditional statement that allows the search for the closest ASR end timestamp to the diarizer's end timestamp to happen only if the first end timestamp is not
None
. This ensures that the alignment operation doesn't execute on potentially faulty data.These changes aim to bolster the code's robustness by counteracting corner cases that may induce errors.
Please note, these modifications do not introduce breaking changes or alterations to functionality. They aim to heighten the precision and reliability of the
diarize.py
script. I am hopeful these changes prove beneficial to the project.Nate