Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version fix for DocSum requirements (unstructured, langchain) #1469

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

okhleif-IL
Copy link
Contributor

@okhleif-IL okhleif-IL commented Jan 24, 2025

Description

This is a bugfix for DocSum URL Summarization. Fixed version for langchain and unstructured in order to bypass nltk related bug occurring in URL summarization.

in newer versions of langchain_community and unstructured, the following would be required in a user's environment.
import nltk
nltk.download("punkt_tab")
nltk.download("averaged_perceptron_tagger_eng")

Older versions of said libraries include handle these requirements automatically

Issues

Although the original suspicion was different, this is a fix for #1464

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

Fixed langchain and unstructured to older set versions.

Tests

Manually rebuilt, composed, and tested via UI

Copy link

github-actions bot commented Jan 24, 2025

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 1 package(s) with unknown licenses.
See the Details below.

License Issues

DocSum/ui/gradio/requirements.txt

PackageVersionLicenseIssue Type
langchain_community0.3.9NullUnknown License

Scanned Files

  • DocSum/ui/gradio/requirements.txt

@okhleif-IL
Copy link
Contributor Author

If there is some reluctance to peg versions, an alternative solution would be to add this to the dockerfile:
RUN python3 -c "import nltk;nltk.download('punkt_tab');nltk.download('averaged_perceptron_tagger_eng')"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant