Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch entity summarization #1709

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

joerattazzi-microsoft
Copy link

@joerattazzi-microsoft joerattazzi-microsoft commented Feb 13, 2025

Description

In cases where you're updating your graph with a large number of inputs, the asyncio.gather command during Updating Final Entities step can completely block and time out your calls. Link to code:

results = await asyncio.gather(*tasks)

Here, I'm proposing a change to batch the calls through a configurable value

Related Issues

#1707

Proposed Changes

  • Configurable batch_size added to summarize_descriptions_config
  • Do batching of calls to the LLM for run_entity_summarization
  • Add logging to help clarify what's going on

Checklist

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

Additional Notes

[Add any additional notes or context that may be helpful for the reviewer(s).]

Tested with a fairly large data set via our Synapse pipeline

@joerattazzi-microsoft joerattazzi-microsoft changed the title Batch entry summarization Batch entity summarization Feb 13, 2025
@joerattazzi-microsoft joerattazzi-microsoft force-pushed the joerattazzi-microsoft/batch-entry-summarization branch from 3a329f4 to 974cf07 Compare February 14, 2025 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant