Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio file loader implemented with Azure speech service #13988

Closed

Conversation

kzmain
Copy link

@kzmain kzmain commented Nov 28, 2023

No description provided.

Copy link

vercel bot commented Nov 28, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Mar 4, 2024 6:08am

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. Ɑ: doc loader Related to document loader module (not documentation) 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Nov 28, 2023
from langchain.document_loaders.parsers.audio import AzureSpeechServiceParser


class AzureSpeechServiceLoader(GenericLoader):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a PR here to help with the pattern for Generic Loader: #14004 -- we don't want any file logic to live outside of file system blob loader.

@kzmain met know if you want to complete the refactor or if you want me to do it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear eyurtsev,

I found I miss used the GenericLoader, and updated the AzureSpeechServiceLoader class to inherit the BaseLoader class. I pushed it in the commit 7b8372d. Look forward to seeing feedback from you. Thanks.

Kind Regards,
Kai Zhang

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kzmain

  1. I updated the service loader to use generic loader (see Improve file system blob loader and generic loader #14004)
  2. I expanded the kwargs so the parameters are easy to discover and refactored some of the resolving of values

Would you be able to test that the service still works for you? I don't have credentials set up so testing help will be helpful

  1. Could you run the parser with 2 audio files at once using from_filesystem to confirm that transcribing more than one file at a time works?

class AzureSpeechServiceParser(BaseBlobParser):
"""Loads an Audio with azure.cognitiveservices.speech."""

def __init__(self, **kwargs: Any) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could you add a link to the relevant pages on azure that explain the API?
  2. Could you expand the kwargs and document the variable meanings?
  3. Could we add a parameter to control the polling interval (i.e., sleep is hard-coded to 0.5 seconds right now, we should expose this to the user)

nit: It's common to use logger rather than print statements in production code, you could consider swapping to using logger.info / logger.error as appropriate. (not a requirement to merge since we have some other parsers that are using print right now)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear eyurtsev,

I fixed all the issues in the commit 7b8372d:

I Added Azure Speech service official documents and code pages.
I expanded and documented the kwargs meanings.
I added a parameter to control the transcribe job's polling interval.

Furthermore, I did all the unit tests in the test_audio.py file and here's the evidence:
Screenshot 2023-12-03 at 03 26 35

Look forward to seeing feedback from you. Thanks.

kzmain and others added 8 commits December 3, 2023 03:26
1. Fix misuse of GenericLoader in the AzureSpeechServiceLoader class
2.1. Add Azure Speech service official documents and code pages.
2.2. Expand and document the kwargs meanings.
2.3. Add a parameter to control the transcribe job's polling interval
from langchain.document_loaders.parsers.audio import AzureSpeechServiceParser


class AzureSpeechServiceLoader(GenericLoader):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kzmain

  1. I updated the service loader to use generic loader (see Improve file system blob loader and generic loader #14004)
  2. I expanded the kwargs so the parameters are easy to discover and refactored some of the resolving of values

Would you be able to test that the service still works for you? I don't have credentials set up so testing help will be helpful

  1. Could you run the parser with 2 audio files at once using from_filesystem to confirm that transcribing more than one file at a time works?

SPEECH_SERVICE_KEY = ""


def _get_csv_file_path() -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix the testing code here?



def test_azure_speech_load_key_region_auto_detect_languages() -> None:
loader = AzureSpeechServiceLoader(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loader = AzureSpeechServiceLoader.from_filesystem(...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @eyurtsev ,

AzureSpeechServiceLoader doesn't inherit the GenericLoader but the BaseLoader just like the CSVLoader and the PDFLoader. In this case, it does not use the from_filesystem() function but load the file with class initializer like: AzureSpeechServiceLoader(file_path: str) format.

I uploaded the unit tests success screenshot in previous conversation. I am wondering if I upload my unit test result with console result with console and the test code (without my credential), will help your code review?

Copy link
Author

@kzmain kzmain Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @eyurtsev ,

First appreciate you soooo much for your updated my parser neat and well-structured, I found it until I got home. Here's what I updated today:

  1. the class AzureSpeechServiceParser __init__ function kwargs' type
  2. changed the loader from a GenericLoader to BaseLoader, and added the lazy_load function
  3. updated the unit tests
    Screenshot 2023-12-06 at 20 04 05

@kzmain
Copy link
Author

kzmain commented Dec 25, 2023

Dear @eyurtsev Can you help me to check if my commit is available to merge?

@hwchase17 hwchase17 closed this Jan 30, 2024
@baskaryan baskaryan reopened this Jan 30, 2024
@ccurme ccurme added the community Related to langchain-community label Jun 18, 2024
from langchain_community.document_loaders import AzureAISpeechLoader

SPEECH_SERVICE_REGION = "eastasia"
SPEECH_SERVICE_KEY = "c77dcf2aa5c04dd6b6613f77d9d9161d"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kzmain if this is a correct service key assume that it's been compromised since it's available here in plain text

@eyurtsev
Copy link
Collaborator

Hi @kzmain sorry for failing to follow on this PR. I'm going to close this since this is a year old and still hasn't been merged. I appreciate you contributing to the project and really apologize that it's taken so long to get a resolution.

@eyurtsev eyurtsev closed this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants