Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcp1 projects cannot be exported - missing dcpVersion #741

Closed
yusra-haider opened this issue Jun 14, 2021 · 7 comments
Closed

dcp1 projects cannot be exported - missing dcpVersion #741

yusra-haider opened this issue Jun 14, 2021 · 7 comments
Assignees
Labels
operations This issue is an operational task

Comments

@yusra-haider
Copy link
Contributor

yusra-haider commented Jun 14, 2021

The metadata documents for some of the dcp 1 datasets don't have the dcpVersion field set in the database which leads to errors when these datasets have to be re-exported.

The corresponding error log:
ingest-exporter-59cdf8bb59-b97lv :   2021-06-11 13:10:41,267  - exporter.terra.terra_listener - ERROR in terra_listener.py:88 _experiment_message_handler(): Failed to export experiment message with body: {"exportJobId":"60bdfbe520647123c5489704","documentId":"5d1e015e88fa640008b0010b","documentUuid":"8f4d6182-43c5-4dbc-9e71-2b99f1929c2d","callbackLink":"/processes/5d1e015e88fa640008b0010b","documentType":"process","envelopeId":"5d1e015c88fa640008b0009d","envelopeUuid":"7b4cd093-bfa5-477e-9c95-69bafc1cb6bf","projectId":"5d1e015d88fa640008b0009f","projectUuid":"f86f1ab4-1fbb-4510-ae35-3ffd752d4dfc","index":1,"total":19}
ingest-exporter-59cdf8bb59-b97lv :   2021-06-11 13:10:41,267  - exporter.terra.terra_listener - ERROR in terra_listener.py:89 _experiment_message_handler(): strptime() argument 1 must be str, not None
ingest-exporter-59cdf8bb59-b97lv :  Traceback (most recent call last):
ingest-exporter-59cdf8bb59-b97lv :    File "/app/exporter/metadata.py", line 59, in from_dict
ingest-exporter-59cdf8bb59-b97lv :      return MetadataResource(metadata_type, metadata_json, uuid, dcp_version, provenance, full_resource=data)
ingest-exporter-59cdf8bb59-b97lv :    File "/app/exporter/metadata.py", line 38, in __init__
ingest-exporter-59cdf8bb59-b97lv :      self.dcp_version = utils.to_dcp_version(dcp_version)
ingest-exporter-59cdf8bb59-b97lv :    File "/app/exporter/utils.py", line 11, in to_dcp_version
ingest-exporter-59cdf8bb59-b97lv :      date = parse_date_string(date_str)
ingest-exporter-59cdf8bb59-b97lv :    File "/app/exporter/utils.py", line 18, in parse_date_string
ingest-exporter-59cdf8bb59-b97lv :      return datetime.strptime(date_str, date_format)
ingest-exporter-59cdf8bb59-b97lv :  TypeError: strptime() argument 1 must be str, not None
ingest-exporter-59cdf8bb59-b97lv :  
ingest-exporter-59cdf8bb59-b97lv :  During handling of the above exception, another exception occurred:
ingest-exporter-59cdf8bb59-b97lv :  
ingest-exporter-59cdf8bb59-b97lv :  Traceback (most recent call last):
ingest-exporter-59cdf8bb59-b97lv :    File "/app/exporter/terra/terra_listener.py", line 76, in _experiment_message_handler
ingest-exporter-59cdf8bb59-b97lv :      self.terra_exporter.export(exp.process_uuid, exp.submission_uuid, exp.job_id)
ingest-exporter-59cdf8bb59-b97lv :    File "/app/exporter/terra/terra_exporter.py", line 28, in export
ingest-exporter-59cdf8bb59-b97lv :      process = self.get_process(process_uuid)
ingest-exporter-59cdf8bb59-b97lv :    File "/app/exporter/terra/terra_exporter.py", line 68, in get_process
ingest-exporter-59cdf8bb59-b97lv :      return MetadataResource.from_dict(self.ingest_client.get_entity_by_uuid('processes', process_uuid))
ingest-exporter-59cdf8bb59-b97lv :    File "/app/exporter/metadata.py", line 61, in from_dict
ingest-exporter-59cdf8bb59-b97lv :      raise MetadataParseException(e)
ingest-exporter-59cdf8bb59-b97lv :  exporter.metadata.MetadataParseException: strptime() argument 1 must be str, not None

i. find out the dcp 1 datasets that don't have this field populated

ii. set this field for those datasets, in the biomaterial, process, file, and protocol collections in the db

for reference, these mongo queries can be used for updating the dcpVersion field

db.protocol.updateMany({"submissionEnvelope" : DBRef("submissionEnvelope", ObjectId("<submission envelope id>")), "project": DBRef("project", ObjectId("<project id>"))}, {$set: {"dcpVersion": new Date("<date>")}})
db.biomaterial.updateMany({"submissionEnvelope" : DBRef("submissionEnvelope", ObjectId("<submission envelope id>")), "project": DBRef("project", ObjectId("<project id>"))}, {$set: {"dcpVersion": new Date("<date>")}})
db.file.updateMany({"submissionEnvelope" : DBRef("submissionEnvelope", ObjectId("<submission envelope id>")), "project": DBRef("project", ObjectId("<project id>"))}, {$set: {"dcpVersion": new Date("<date>")}})
db.process.updateMany({"submissionEnvelope" : DBRef("submissionEnvelope", ObjectId("<submission envelope id>")), "project": DBRef("project", ObjectId("<project id>"))}, {$set: {"dcpVersion": new Date("<date>")}})

for reference, this is the list of dcp1 datasets: https://data.humancellatlas.org/explore/projects?catalog=dcp1&filter=%5B%7B%22facetName%22:%22biologicalSex%22,%22terms%22:%5B%22female%22%5D%7D%5D

@aaclan-ebi
Copy link

aaclan-ebi commented Jun 23, 2021

The fix here could be needed here to fix this issue: #334

@amnonkhen amnonkhen assigned yusra-haider and unassigned aaclan-ebi Jun 28, 2021
@Wkt8
Copy link
Collaborator

Wkt8 commented Jul 5, 2021

@jacobwindsor could you pick this as an operations task up next? Check with @yusra-haider how much she has done on this - and whether it's straightforward for you to continue on this.

We want to be able to make updates to dcp1 datasets when the dev work for bulk updates is completed.

@clairerye
Copy link
Contributor

clairerye commented Jul 5, 2021

This needs to be done in conjunction with https://app.zenhub.com/workspaces/dcp-ingest-product-development-5f71ca62a3cb47326bdc1b5c/issues/ebi-ait/dcp-ingest-central/376 and #334. I think this is product dev so will move it to the other board.

@clairerye clairerye transferred this issue from ebi-ait/hca-ebi-wrangler-central Jul 5, 2021
@amnonkhen
Copy link
Contributor

@yusra-haider is this still relevant?

@amnonkhen amnonkhen added the operations This issue is an operational task label Apr 8, 2022
@amnonkhen amnonkhen transferred this issue from ebi-ait/dcp-ingest-central Apr 8, 2022
@amnonkhen amnonkhen assigned amnonkhen and unassigned yusra-haider May 19, 2022
@amnonkhen
Copy link
Contributor

Although the dcpVersion field in the database documents of dcp1 project has been updated, this is not enough to re-export all the projects, because some of them would not pass schema validation as the database is missing fields required by the schema. See #121 for example, where file size and mime type are missing.

@amnonkhen
Copy link
Contributor

dcpVersion field was added by Alegria.
See ticket 481

@amnonkhen amnonkhen changed the title Investigate which dcp 1 projects are re-export-able dcp1 projects cannot be exported - missing dcpVersion Jul 4, 2022
@ofanobilbao
Copy link
Contributor

The scope of this ticket is Done. So closing this ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operations This issue is an operational task
Projects
None yet
Development

No branches or pull requests

6 participants