Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛Bug: The aload function, contrary to its name, is not an asynchronous function, so it cannot work concurrently with other asynchronous functions. #28336

Open
5 tasks done
yeounhak opened this issue Nov 25, 2024 · 1 comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@yeounhak
Copy link
Contributor

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_community import document_loaders as dl

async def do_something():
    await asyncio.sleep(1)

async def main():
    loader1 = dl.WebBaseLoader("https://www.fntimes.com/html/view.php?ud=202411242104045546dd55077bc2_18")

    results = await asyncio.gather(loader1.aload(), do_something())

    print(results)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Error Message and Stack Trace (if applicable)

python bug_langchain.py 
USER_AGENT environment variable not set, consider setting it to identify your requests.
Traceback (most recent call last):
  File "/home/dnsgkr23/bug_langchain.py", line 15, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/dnsgkr23/bug_langchain.py", line 9, in main
    results = await asyncio.gather(loader1.aload(), do_something())
                                   ^^^^^^^^^^^^^^^
  File "/home/dnsgkr23/langchain/libs/community/langchain_community/document_loaders/web_base.py", line 337, in aload
    results = self.scrape_all(self.web_paths)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dnsgkr23/langchain/libs/community/langchain_community/document_loaders/web_base.py", line 278, in scrape_all
    results = asyncio.run(self.fetch_all(urls))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 186, in run
    raise RuntimeError(
RuntimeError: asyncio.run() cannot be called from a running event loop
sys:1: RuntimeWarning: coroutine 'WebBaseLoader.fetch_all' was never awaited

Description

The aload function, contrary to its name, is not an asynchronous function,
so it cannot work concurrently with other asynchronous functions.

System Info

python -m langchain_core.sys_info

System Information
------------------
> OS:  Linux
> OS Version:  #1 SMP PREEMPT_DYNAMIC Debian 6.1.115-1 (2024-11-01)
> Python Version:  3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0]

Package Information
-------------------
> langchain_core: 0.3.19
> langchain: 0.3.7
> langchain_community: 0.3.7
> langsmith: 0.1.142
> langchain_tests: 0.3.4
> langchain_text_splitters: 0.3.2

Optional packages not installed
-------------------------------
> langserve

Other Dependencies
------------------
> aiohttp: 3.10.10
> async-timeout: Installed. No version info available.
> dataclasses-json: 0.6.7
> httpx: 0.27.2
> httpx-sse: 0.4.0
> jsonpatch: 1.33
> numpy: 1.26.4
> orjson: 3.10.11
> packaging: 24.2
> pydantic: 2.9.2
> pydantic-settings: 2.6.1
> pytest: 7.4.4
> PyYAML: 6.0.2
> requests: 2.32.3
> requests-toolbelt: 1.0.0
> SQLAlchemy: 2.0.35
> syrupy: 4.7.2
> tenacity: 9.0.0
> typing-extensions: 4.12.2
@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Nov 25, 2024
@yeounhak yeounhak changed the title bug: The aload function, contrary to its name, is not an asynchronous function, so it cannot work concurrently with other asynchronous functions. 🐛Bug: The aload function, contrary to its name, is not an asynchronous function, so it cannot work concurrently with other asynchronous functions. Nov 29, 2024
@yeounhak
Copy link
Contributor Author

In the document_loaders folder, there are 9 files that implement aload or alazy_load, but only web_base.py is not defined as an async def. This causes inconvenience for users, who need to check the code to determine whether to use await aload() or aload().

A bigger problem is that users cannot use web_base.py's aload concurrently with other async functions. The root cause is that web_base.py overrides the aload function from langchain_core's base.py as a synchronous function in here.

Therefore, I submitted a pull request to fix this issue in #28337.

The 9 files that use the aload function, which I found in the document_loaders folder, are:

  1. async_html.py
  2. astradb.py
  3. cassandra.py
  4. chromium.py
  5. merge.py
  6. mongodb.py
  7. surrealdb.py
  8. url_playwright.py
  9. web_base.py

@baskaryan @efriis @eyurtsev @ccurme @vbarda @hwchase17

ccurme added a commit that referenced this issue Dec 20, 2024
…#28337)

- **Description:** The aload function, contrary to its name, is not an
asynchronous function, so it cannot work concurrently with other
asynchronous functions.

- **Issue:** #28336 

- **Test: **: Done

- **Docs: **
[here](https://github.com/yeounhak/langchain/blame/e0a95e5646f086c696e43de2a3dac0f230063341/docs/docs/integrations/document_loaders/web_base.ipynb#L201)

- **Lint: ** All checks passed

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant