Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stanza 1/refactor nltk stanza #88

Merged
merged 11 commits into from
Apr 4, 2024
Merged

Conversation

AaronWChen
Copy link
Owner

No description provided.

Having an issue where the ingredients return "list has no split" when feeding in multiple recipes (tried 5 from the source)
Need to fix possible empty list cases in pipeline
Working in notebook, but adding the notebook to commit would result in 5 hours of compute and will run in morning
Joblib and pickle cannot handle nested functions. Added dill library to the virtual environment to address this. Caution: dill is more vulnerable to security issues but adds additional features.

CountVectorization and TF-IDF now work and can be saved and logged as MLflow artifacts. Need to rerun on separate machine to test/not inhibit streaming: I noticed that the current method hits GPU performance (even affecting watching YouTube while the model runs).
Copy link

dagshub bot commented Nov 28, 2023

@AaronWChen AaronWChen merged commit 71b9dd6 into dev Apr 4, 2024
0 of 2 checks passed
@AaronWChen AaronWChen deleted the STANZA-1/refactor-nltk-stanza branch April 4, 2024 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant