-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multicore support? #22
Comments
I couldn't either, an easy workaround is to just split the text file and launch two processes, one for each half of the file. |
Yeah Running several processes on file splits is exactly what I am doing
right now.
But it would be nice to have some flag/method that allows using all
cores like the stanfordNLP does.
…On Thu 4 Oct, 2018, 1:06 AM Guilherme Salomé, ***@***.***> wrote:
I couldn't either, an easy workaround is to just split the text file and
launch two processes.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AO5Ljxqtq0GUnMfjY0TRaEGudUAqXBXfks5uhRG-gaJpZM4XF42I>
.
|
If you find a more efficient solution or update the code to allow for multicores please post it here! |
@hthuwal I've been using this project to go over a lot of text and I was running it in a single powerful machine and it was veeery slow. I then went to digitalocean and got one of the high tier droplets and started running openie5 in parallel (with https://www.gnu.org/software/parallel/) and on a small number of phrases (1000 at a time, more to debug really, but could be increased). The processing time was about 3 minutes for each 1000 (1 min for loading up open ie 5 more or less). At first I was trying with the The droplet I was using has 64gb of RAM + 32vCPUs. The type of the droplet is: "CPU Optimized droplet". |
Update: I tested their Standard Droplet with 192gb of memory and 32vCPUs and I was able to run 8 processes at the same time. That consumed 92% ish of the memory. The average CPU use was 1200%. So the bottleneck is definitely memory. Update: Looking at the Anyways, maybe this can help you speed up. Btw digital ocean (referral link) is giving $100 for use during october. That can buy about 60 hours of the most expensive droplet. |
Thanx @salompas. Yes, memory is the bottleneck because the process requires ~10 GB of memory just to run. I have access to a machine with about ~80GB of RAM and 32 cores. I was able to run 3 processes simultaneously. Any further increase in the number of processes chokes up the machine. Thanx for reminding about the parallel command. I totally forgot about this and wrote a script that splits the data and spawns processes in multiple tmux windows. |
openie 4.2 + had multi core support with multi threaded environment ( With approx constant RAM usage ) .
Swarna may update , OpenIe 5.x is thread safe? |
One more performance related suggestion, reading the files is costly , so smaller chunks must help. Choosing chunk size is another smart thing to do. |
@vaibhavad @swarnaHub @harrysethi @schmmd @bhadramani I tried to naively use concurrent Futures in Scala and divided sentences among them (in OpenIECli.scala). (I found OpenNLP Chunker as non thread-safe so I put it in blocking{}). But this is not giving me any improvement. For 8 concurrent futures (and 80 sentences), run time is slightly slower than serial. The extracts are getting serialized at some point, although they are running in different threads. PS: I also see some nThreads set to 1 in some targets:
|
The multithreaded implementation is working now, giving 4X improvement with 6 threads (tried on a 20-core machine. Increasing threads further showed no further improvement). The reason it wasn't showing any improvement earlier was that I was using too less heap memory - 10G. Increasing 10G to 12G gave substantial improvement in runtime already (around 10X in extractions). |
@ambujpd |
With higher number of threads (8+), I sporadically see one or two sentences (out of 80) throwing NullPointerException from OpenNLP Chunker, even though I've put that call within Blocking. I'm looking into it currently. |
@ambujpd Hey! Are you able to share your multithreaded implementation? It would be super useful for me personally, and cut down my development time by quite a bit. Happy to spend time on the code to help if necessary |
@moinnadeem I don't have the code with me unfortunately (I remember I was able able to use some thread-safe NLP chunker, along with Scala concurrency and had gotten rid of sporadic NullPointerException issue). But in conclusion I found it was not worth the effort as scalability was quite limited. A much better alternative is multi-processing (at the cost of extra memory) which I eventually ended up using. |
Hi @ambujpd @moinnadeem @bhadramani @hthuwal @salompas, We have just released a neural OpenIE system - OpenIE6, which is better in performance and at least 10x faster than OpenIE-5 (if you run it on a GPU). You can check it out here - https://github.com/dair-iitd/openie6 |
I am trying to run it on a 1.5 GB text file. The model uses only a single core and hence it's taking too long.
I couldn't find a flag to specify the number of threads to use. Is there a way to run the model on multiple cores?
The text was updated successfully, but these errors were encountered: