-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
- model testing issue: added nltk package averaged_perceptron_tagger_eng #1658
Conversation
WalkthroughThe pull request introduces modifications to the Dockerfile for a Python-based project, focusing on environment setup and package management. The changes include updating the base Amazon Linux image, installing Python 3.10.13 with specific optimizations, and configuring development tools. New environment variables for cache locations are added, and the NLTK setup is enhanced with an additional resource downloader for English language processing. Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docker/Dockerfile (2)
Line range hint
43-43
: Consider setting explicit permissions for the cache directoryWhile setting cache directories is good practice, ensure proper permissions are set for the
/home/cache
directory to prevent potential permission issues during runtime.Add explicit permissions after creating the cache directory:
mkdir data_generator training_data testing_data models /home/cache .rasa + chmod 755 /home/cache
Line range hint
7-9
: Consider using multi-stage builds to reduce attack surfaceThe current Dockerfile includes development tools and build dependencies that aren't needed at runtime. Consider using multi-stage builds to create a smaller, more secure production image.
Example structure:
# Build stage FROM amazonlinux:latest AS builder # ... all build steps ... # Production stage FROM amazonlinux:latest COPY --from=builder /usr/local/lib/python3.10 /usr/local/lib/python3.10 COPY --from=builder /usr/local/bin/python3.10 /usr/local/bin/python3.10 # ... copy only necessary files and directories ...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docker/Dockerfile
(1 hunks)
🔇 Additional comments (1)
docker/Dockerfile (1)
29-29
:
Remove incorrect and redundant NLTK package download
The package averaged_perceptron_tagger_eng
is incorrect and doesn't exist in NLTK. Moreover, the correct package averaged_perceptron_tagger
is already being downloaded in the line above.
Remove this line as it's redundant and will cause the Docker build to fail:
- python -m nltk.downloader averaged_perceptron_tagger_eng && \
Let's verify the available NLTK packages:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
Summary by CodeRabbit
New Features
HF_HOME
andSENTENCE_TRANSFORMERS_HOME
.Bug Fixes
Chores