Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

post-enhancement: complete CC-license information #172

Open
fschwenn opened this issue Sep 1, 2017 · 7 comments
Open

post-enhancement: complete CC-license information #172

fschwenn opened this issue Sep 1, 2017 · 7 comments

Comments

@fschwenn
Copy link
Contributor

fschwenn commented Sep 1, 2017

Create a workflow task to complete creativ commons license information which could be added to POSTENHANCE_RECORD for HEP records.

Expected Behavior

In the HEP schema the license contains the license and the url. For CC licenses this is redundant and often only one of the two is contained in the original metadata. Instead of completing it in all the individual crawlers this task could be done in a central place.

@michamos
Copy link
Contributor

michamos commented Sep 1, 2017

I think the literature builder https://github.com/inspirehep/inspire-schemas/blob/36bb1791b4df5890e5445f850c59ed9c5ee9b7c9/inspire_schemas/builders/literature.py#L493-L519 is a better place for this, but I agree in the principle of centralizing this normalization.

@fschwenn
Copy link
Contributor Author

fschwenn commented Sep 1, 2017

Obviously you know the system better than me. Does every record 'pass' literature.py? Also user suggestions and new records by BibEdit?

@michamos
Copy link
Contributor

all new records go through the builder. Migrations from legacy and manual edits using the record editor don't. Do you think it's needed there too?

@kaplun
Copy link
Contributor

kaplun commented Oct 12, 2017

all new records go through the builder

@michamos besides records we will gather from DESY interim harvester and CDS.

@kaplun
Copy link
Contributor

kaplun commented Oct 12, 2017

Maybe that at some point we should ditch inspire-dojson and write a driver that transform bibrec from MARCXML using the Builder.

@jacquerie
Copy link
Contributor

Maybe that at some point we should ditch inspire-dojson and write a driver that transform bibrec from MARCXML using the Builder.

Probably not, because then you will have to reimplement all the normalization that handles anomalies in Legacy's data (you have to ensure that you don't regress on the ~500 test cases in inspire-dojson).

@michamos
Copy link
Contributor

The direction URL -> license name has been added to the builder in inspirehep/inspire-schemas#244 and inspirehep/inspire-schemas#245. The other direction has not been implemented yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants