-
Notifications
You must be signed in to change notification settings - Fork 16k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio file loader implemented with Azure speech service #13988
Audio file loader implemented with Azure speech service #13988
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
from langchain.document_loaders.parsers.audio import AzureSpeechServiceParser | ||
|
||
|
||
class AzureSpeechServiceLoader(GenericLoader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear eyurtsev,
I found I miss used the GenericLoader, and updated the AzureSpeechServiceLoader class to inherit the BaseLoader class. I pushed it in the commit 7b8372d. Look forward to seeing feedback from you. Thanks.
Kind Regards,
Kai Zhang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I updated the service loader to use generic loader (see Improve file system blob loader and generic loader #14004)
- I expanded the kwargs so the parameters are easy to discover and refactored some of the resolving of values
Would you be able to test that the service still works for you? I don't have credentials set up so testing help will be helpful
- Could you run the parser with 2 audio files at once using
from_filesystem
to confirm that transcribing more than one file at a time works?
class AzureSpeechServiceParser(BaseBlobParser): | ||
"""Loads an Audio with azure.cognitiveservices.speech.""" | ||
|
||
def __init__(self, **kwargs: Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Could you add a link to the relevant pages on azure that explain the API?
- Could you expand the kwargs and document the variable meanings?
- Could we add a parameter to control the polling interval (i.e., sleep is hard-coded to 0.5 seconds right now, we should expose this to the user)
nit: It's common to use logger rather than print statements in production code, you could consider swapping to using logger.info / logger.error as appropriate. (not a requirement to merge since we have some other parsers that are using print right now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear eyurtsev,
I fixed all the issues in the commit 7b8372d:
I Added Azure Speech service official documents and code pages.
I expanded and documented the kwargs meanings.
I added a parameter to control the transcribe job's polling interval.
Furthermore, I did all the unit tests in the test_audio.py file and here's the evidence:
Look forward to seeing feedback from you. Thanks.
1. Fix misuse of GenericLoader in the AzureSpeechServiceLoader class 2.1. Add Azure Speech service official documents and code pages. 2.2. Expand and document the kwargs meanings. 2.3. Add a parameter to control the transcribe job's polling interval
from langchain.document_loaders.parsers.audio import AzureSpeechServiceParser | ||
|
||
|
||
class AzureSpeechServiceLoader(GenericLoader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I updated the service loader to use generic loader (see Improve file system blob loader and generic loader #14004)
- I expanded the kwargs so the parameters are easy to discover and refactored some of the resolving of values
Would you be able to test that the service still works for you? I don't have credentials set up so testing help will be helpful
- Could you run the parser with 2 audio files at once using
from_filesystem
to confirm that transcribing more than one file at a time works?
SPEECH_SERVICE_KEY = "" | ||
|
||
|
||
def _get_csv_file_path() -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix the testing code here?
|
||
|
||
def test_azure_speech_load_key_region_auto_detect_languages() -> None: | ||
loader = AzureSpeechServiceLoader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loader = AzureSpeechServiceLoader.from_filesystem(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @eyurtsev ,
AzureSpeechServiceLoader doesn't inherit the GenericLoader
but the BaseLoader
just like the CSVLoader and the PDFLoader. In this case, it does not use the from_filesystem() function but load the file with class initializer like: AzureSpeechServiceLoader(file_path: str)
format.
I uploaded the unit tests success screenshot in previous conversation. I am wondering if I upload my unit test result with console result with console and the test code (without my credential), will help your code review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @eyurtsev ,
First appreciate you soooo much for your updated my parser neat and well-structured, I found it until I got home. Here's what I updated today:
…oader to BaseLoader
Dear @eyurtsev Can you help me to check if my commit is available to merge? |
from langchain_community.document_loaders import AzureAISpeechLoader | ||
|
||
SPEECH_SERVICE_REGION = "eastasia" | ||
SPEECH_SERVICE_KEY = "c77dcf2aa5c04dd6b6613f77d9d9161d" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kzmain if this is a correct service key assume that it's been compromised since it's available here in plain text
Hi @kzmain sorry for failing to follow on this PR. I'm going to close this since this is a year old and still hasn't been merged. I appreciate you contributing to the project and really apologize that it's taken so long to get a resolution. |
No description provided.