Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter to include task name in all output files (including maps and extradata) #774

Open
banerjek opened this issue Jun 30, 2024 · 2 comments

Comments

@banerjek
Copy link
Member

Parallel processing files is currently awkward -- it must either be done by creating map and extradata files from object files or creating parallel iterations.

Suggest true/false_taskNameBasedFiles_ parameter, default to current behavior. Setting it to true would include task name in maps and extradata files as is done with object files. Expectation would be that IC would manage these files manually/separately. This most useful for bibs and holdings (MARC and CSV based) tasks.

Aside from simplifying parallel processes, the feature would simplify combining task output from different sources requiring different maps.

@bltravis
Copy link
Collaborator

bltravis commented Jul 9, 2024

@banerjek Could you describe the workflow you envision being supported by this enhancement a bit more?

@banerjek
Copy link
Member Author

Goal is to simplify parallel processing.

Currently, object and extradata files (I mistitled this) all contain the task name, but maps do not -- this is based on migrationTaskType.

This makes it awkward to run overlapping processes for the same task type (e.g. different source systems, vertically sharding out processes for Instance/Holdings/Items with especially large systems) -- you basically have to run parallel iterationIdentifiers or directories which is doable, but clunky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants