Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New workflow needed for incremental updates #6

Open
eseiver opened this issue Sep 25, 2017 · 0 comments
Open

New workflow needed for incremental updates #6

eseiver opened this issue Sep 25, 2017 · 0 comments
Assignees
Labels

Comments

@eseiver
Copy link
Collaborator

eseiver commented Sep 25, 2017

[From old repo]

right now, the download_check_and_move function has three methods of looking for updated article XML after entirely new articles have been downloaded to the temporary download directory:

  1. check for new corrections articles in the temp directory & see if the accompanying corrected articles have updated XML, downloading any corrected articles with new versions
  2. check Solr for version-of-record (VOR) updates to uncorrected proofs (status: Not working)
  3. check all XML directly for updated uncorrected proofs in uncorrected_proofs_list.txt.

If an article's XML is updated for any reason other than corrections or VOR, it currently cannot be detected by searching Solr. The only way to be sure is to check every article's XML manually, as in the revisiondate_sanity_check function in corpus_analysis.py, which is time-consuming and hits journals.plos.org pretty inefficiently. JIRA no-NOR ticket labels can help with this to some degree (see #20), but that doesn't work outside of PLOS.

One solution would be using a hashtable, as in https://github.com/PLOS/allofplos_upload/issues/6. Is there any other way @sbassi?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants