Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ngram 1/try llm code speedup #90

Merged
merged 8 commits into from
Apr 4, 2024
Merged

Conversation

AaronWChen
Copy link
Owner

No description provided.

Several did not work (which is ok), but to do further testing, needed to install more-itertools in venv, and can rerun some code
Forced restart cleared jupyter kernel, so might as well add FlagEmbedding library to create embeddings for retrieval.

Read a few discussions on how slow Stanza is with the pipeline. One thing is that I think there is a strong benefit to using lemmas for tokens, which needs many of the processing pipelines. Unfortunately, this looks like the slow part of processing.

Saw this discussion on [using multiprocessing combined with Stanza](stanfordnlp/stanza#552) and it seems like multiprocessing with GPU doesn't result in much performance increase but does drastically cause the GPU to work harder. However, discussion reminded me that I turned off GPU with Stanza in an attempt to use BERTopic with Stanza. Will re-enable and retime.
Moving lemmafication out of sklearn pipeline to speed up count vectorization/TFIDF

Will have to add similar refactor to query code
Running into an error:
```TypeError: CustomSKLearnAnalyzer.ngrams_maker() got multiple values for argument 'min_ngram_length'```
Experiment with BGE-M3 and smaller datasets as laptop builds model(s)
Copy link

dagshub bot commented Apr 4, 2024

@AaronWChen AaronWChen merged commit 9c0983d into dev Apr 4, 2024
0 of 2 checks passed
@AaronWChen AaronWChen deleted the NGRAM-1/try-llm-code-speedup branch April 4, 2024 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant