You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're deploying the PublicationsRetriever in an ETL pipeline that is not as stable as we would like, so sometimes jobs are retried. I would like to know if it would be possible to implement a flag or argument that would skip a file download, when it already exists in the destination? This would make it easier to rerun it in a containerized setting with shared storage and skip already retrieved files, and would also prevent some downstream issues with (1) and (2) suffixes being added to filenames.
Again, many thanks for this package, it's saving us heaps of time!
Best,
Tijmen
The text was updated successfully, but these errors were encountered:
As per your request, it is definitely possible, although, my time is very limited to work on tasks outside of the scope of OpenAIRE Graph.
I will try to make time for your request, however, please note that it may take a while.
Please feel free to submit a PR, in case you are familiar with the code base. Otherwise, please consider removing the records affiliated to previously retrieved files, from the input you give in subsequent runs, as a temporal workaround.
Hi Lampros,
We're deploying the PublicationsRetriever in an ETL pipeline that is not as stable as we would like, so sometimes jobs are retried. I would like to know if it would be possible to implement a flag or argument that would skip a file download, when it already exists in the destination? This would make it easier to rerun it in a containerized setting with shared storage and skip already retrieved files, and would also prevent some downstream issues with
(1)
and(2)
suffixes being added to filenames.Again, many thanks for this package, it's saving us heaps of time!
Best,
Tijmen
The text was updated successfully, but these errors were encountered: