-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installing takes too much memory #64
Comments
Hi! |
Sort of. At the cost of including some duplicates in the SimString database, I was able to reduce the RAM footprint by a significant amount. It now runs for the whole UMLS on my 16G RAM machine. Take a look at my fork of the repository for the fixes. |
Hi Ferdinand,
Thank you so much for following up on this! Would you be willing to make a pull request for this? I would be happy to review it and merge it in the core package.
Best,
Luca
… On Feb 12, 2021, at 00:28, Ferdinand Schlatt ***@***.***> wrote:
Hi!
have you solve that problem?, I have the same :(
Sort of. At the cost of including some duplicates in the SimString database, I was able to reduce the RAM footprint by a significant amount. It now runs for the whole UMLS on my 16G RAM machine. Take a look at my fork of the repository for the fixes.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hey Luca, Sure thing. I've also added that the preferred term is returned and applied black formatting to the repo, so there are a couple of additional changes. I'll create a pull request with my entire fork and we can discuss there, which parts are necessary and which are superfluous. Best, |
Great, I'll try to review over the weekend!
Best,
Luca
…On Fri, Feb 12, 2021 at 7:33 AM Ferdinand Schlatt ***@***.***> wrote:
Hey Luca,
Sure thing. I've also added that the preferred term is returned and
applied black formatting to the repo, so there are a couple of additional
changes. I'll create a pull request with my entire fork and we can discuss
there, which parts are necessary and which are superfluous.
Best,
Ferdinand
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#64 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA53OIWV5KU4HDUSCCZHWHDS6VC6JANCNFSM4TBAQF5Q>
.
|
I ran into this as well. I have 16 GB of memory. Is the recommended approach implementing the changes from the comment above? |
I got it to work by being more selective about what I extracted from UMLS. |
When running
python -m quickumls.install
on an MRCONSO.RRF file with about 7M rows, the memory footprint continuously grows and some point the process is killed because of using too much memory. The two main culprits I could find are the processedQuickUMLS/quickumls/install.py
Line 66 in c0b5db0
QuickUMLS/quickumls/install.py
Line 113 in c0b5db0
I assume they are there to prevent duplicate entries in the SimString and CuiSemType DBs. When using the unqlite database, a check for duplicate entries is implemented on the insert call. So duplicate entries are a non issue. However, I am not sure if the same is true for the SimString database. Is it safe to add a duplicate terms/n-grams to the SimString database or will that break anything? This would then allow removing the memory overhead from the large sets for large UMLS subsets.
The text was updated successfully, but these errors were encountered: