Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to skip existing files? #16

Open
tsaltena opened this issue Oct 25, 2024 · 1 comment
Open

Option to skip existing files? #16

tsaltena opened this issue Oct 25, 2024 · 1 comment

Comments

@tsaltena
Copy link

Hi Lampros,

We're deploying the PublicationsRetriever in an ETL pipeline that is not as stable as we would like, so sometimes jobs are retried. I would like to know if it would be possible to implement a flag or argument that would skip a file download, when it already exists in the destination? This would make it easier to rerun it in a containerized setting with shared storage and skip already retrieved files, and would also prevent some downstream issues with (1) and (2) suffixes being added to filenames.

Again, many thanks for this package, it's saving us heaps of time!

Best,
Tijmen

@LSmyrnaios
Copy link
Owner

Hi Tijmen,

I am glad you find this software useful!

As per your request, it is definitely possible, although, my time is very limited to work on tasks outside of the scope of OpenAIRE Graph.

I will try to make time for your request, however, please note that it may take a while.

Please feel free to submit a PR, in case you are familiar with the code base. Otherwise, please consider removing the records affiliated to previously retrieved files, from the input you give in subsequent runs, as a temporal workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants