makeIdentifiers: Iterating through blocks in parallel #12

hayesall · 2018-07-23T00:15:40Z

This is a possible solution for #11

`main.py`

Now has a -n/--n_jobs parameter (same name used in joblib and sklearn packages). By default this is 1; setting a higher number will use that many cores, setting -1 will use all cores available.

Example command using two cores:

$ cd rnlp
$ python setup.py develop
$ python -m rnlp -n 2 -d example_files/

`rnlp.parse.makeIdentifiers()`

rnlp.parse.makeIdentifiers still takes blocks as its only required parameter, but an optional parameter n_jobs=1 can be overwritten to use more cores.

Potential problems with this request:

Calls to _writeBlock(block, blockID) and _writeSentenceInBlock(sentence, blockID, sentenceID) are currently removed in these commits. This may not be problematic, given that re-thinking how to deal with positive and negative examples is on the list of to-do's.

…--cores

…over a set number of cores.

….makeIdentifiers() in parallel.

…ame task as before, but now has an n_jobs parameter to execute the outer for loop in parallel.

codecov-io · 2018-07-23T00:19:12Z

Codecov Report

Merging #12 into development will decrease coverage by 7.16%.
The diff coverage is 92.98%.

@@              Coverage Diff               @@
##           development     #12      +/-   ##
==============================================
- Coverage        98.26%   91.1%   -7.17%     
==============================================
  Files                7       7              
  Lines              231     236       +5     
==============================================
- Hits               227     215      -12     
- Misses               4      21      +17

Impacted Files	Coverage Δ
rnlp/parse.py	`80.9% <92.98%> (-15.29%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4b965e...6b03210. Read the comment docs.

hayesall · 2018-08-24T21:43:12Z

This definitely makes for a significant speedup (see Issue #11). There are still some quirks with the way some of the output files are created though, so we shouldn't merge this yet.

hayesall · 2022-05-04T15:25:08Z

stale

hayesall added 5 commits July 20, 2018 13:11

Adding a flag to __main__ for setting the number of cores to use. -c/…

c80f63a

…--cores

Merge branch 'master' into parallel

bd89943

Adding rough functions that use joblib to spread the makeIdentifiers …

c32fd7b

…over a set number of cores.

Adding -n/--n_jobs flag to set the number of joblib jobs to run parse…

82d4123

….makeIdentifiers() in parallel.

Replaced makeIdentifiers with a wrapper function which performs the s…

860b0bd

…ame task as before, but now has an n_jobs parameter to execute the outer for loop in parallel.

hayesall requested a review from skinn009 July 23, 2018 00:15

hayesall added the enhancement label Jul 23, 2018

hayesall added this to the Quality of Life Improvements milestone Jul 23, 2018

hayesall added 2 commits August 24, 2018 08:12

Merge branch 'master' into parallel

c524d8d

Adding joblib to the requirements list for running tests.

6b03210

hayesall closed this May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

makeIdentifiers: Iterating through blocks in parallel #12

makeIdentifiers: Iterating through blocks in parallel #12

hayesall commented Jul 23, 2018

codecov-io commented Jul 23, 2018 •

edited

Loading

hayesall commented Aug 24, 2018

hayesall commented May 4, 2022

makeIdentifiers: Iterating through blocks in parallel #12

makeIdentifiers: Iterating through blocks in parallel #12

Conversation

hayesall commented Jul 23, 2018

__main__.py

rnlp.parse.makeIdentifiers()

codecov-io commented Jul 23, 2018 • edited Loading

Codecov Report

hayesall commented Aug 24, 2018

hayesall commented May 4, 2022

`main.py`

`rnlp.parse.makeIdentifiers()`

codecov-io commented Jul 23, 2018 •

edited

Loading