Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makeIdentifiers: Iterating through blocks in parallel #12

Closed
wants to merge 7 commits into from
Closed

makeIdentifiers: Iterating through blocks in parallel #12

wants to merge 7 commits into from

Conversation

hayesall
Copy link
Member

This is a possible solution for #11

__main__.py

Now has a -n/--n_jobs parameter (same name used in joblib and sklearn packages). By default this is 1; setting a higher number will use that many cores, setting -1 will use all cores available.

Example command using two cores:

$ cd rnlp
$ python setup.py develop
$ python -m rnlp -n 2 -d example_files/

rnlp.parse.makeIdentifiers()

rnlp.parse.makeIdentifiers still takes blocks as its only required parameter, but an optional parameter n_jobs=1 can be overwritten to use more cores.


Potential problems with this request:

  • Calls to _writeBlock(block, blockID) and _writeSentenceInBlock(sentence, blockID, sentenceID) are currently removed in these commits. This may not be problematic, given that re-thinking how to deal with positive and negative examples is on the list of to-do's.

@codecov-io
Copy link

codecov-io commented Jul 23, 2018

Codecov Report

Merging #12 into development will decrease coverage by 7.16%.
The diff coverage is 92.98%.

Impacted file tree graph

@@              Coverage Diff               @@
##           development     #12      +/-   ##
==============================================
- Coverage        98.26%   91.1%   -7.17%     
==============================================
  Files                7       7              
  Lines              231     236       +5     
==============================================
- Hits               227     215      -12     
- Misses               4      21      +17
Impacted Files Coverage Δ
rnlp/parse.py 80.9% <92.98%> (-15.29%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4b965e...6b03210. Read the comment docs.

@hayesall
Copy link
Member Author

This definitely makes for a significant speedup (see Issue #11). There are still some quirks with the way some of the output files are created though, so we shouldn't merge this yet.

@hayesall
Copy link
Member Author

hayesall commented May 4, 2022

stale

@hayesall hayesall closed this May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants