Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding files to an included bundle that are already in the target. #129

Open
karlnyr opened this issue Jun 26, 2023 · 7 comments · May be fixed by #215
Open

Adding files to an included bundle that are already in the target. #129

karlnyr opened this issue Jun 26, 2023 · 7 comments · May be fixed by #215

Comments

@karlnyr
Copy link

karlnyr commented Jun 26, 2023

I attempted to add a file to a bundle but the file already existed in the path. A user should be able to add the file if it already exists within the bundle path.

For example file_1 on bundle_1 which has the root of /home/housekeeper-bundles and a version from June 2nd, 2023:

ls -l /home/housekeeper-bundles/bundle_1/2023-06-02/
file_1

When trying to add the file to the already included bundle - should it not just add the file link into the database?

@henrikstranneheim
Copy link
Contributor

What behavior do we want?

  • Replace the file on disk and include in db?
  • Keep original file and just include in db?
  • Should we add the use of a force flag?

@karlnyr
Copy link
Author

karlnyr commented Jun 28, 2023

Keep the original and include it in the database. I don't mind it being a force flag really - I believe that this situation only happens for manual stuff - so a force might be useful :)

@ChrOertlin
Copy link
Contributor

ChrOertlin commented Jul 24, 2023

Intuitively I would think that there should not be any files present in the housekeeper directories if they have not been added through the API.

Can we clarify the manual stuff this happens with? @karlnyr

@ChrOertlin
Copy link
Contributor

moving description over from a duplicated issue:
Description
housekeeper add file fails, stating that the file already exists. However, when the specific bundle is retrieved with housekeeper get bundle it is shown to be empty. When looking at the bundle directory, the file is present in a version - so it should be listed for the bundle. It cannot be retrieved with housekeeper get file either.

The command below was run in the /home/proj/production/housekeeper-bundles/ADM1091A3/2018-06-05 directory:

for f in *; do housekeeper add file -t fastq -t H9GA6ADXX -b ADM1091A3 ./${f}; done
2023-06-13 09:47:51 hasta.scilifelab.se housekeeper.cli.core[37109] INFO Use root path /home/proj/production/housekeeper-bundles
2023-06-13 09:47:51 hasta.scilifelab.se housekeeper.cli.add[37109] INFO Running add file
2023-06-13 09:47:51 hasta.scilifelab.se housekeeper.store.api.handlers.read[37109] INFO Fetching bundle with name: ADM1091A3
Traceback (most recent call last):
File "/home/proj/production/bin/miniconda3/envs/P_main/bin/housekeeper", line 8, in
sys.exit(base())
File "/home/proj/production/bin/miniconda3/envs/P_main/lib/python3.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/proj/production/bin/miniconda3/envs/P_main/lib/python3.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/proj/production/bin/miniconda3/envs/P_main/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/proj/production/bin/miniconda3/envs/P_main/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/proj/production/bin/miniconda3/envs/P_main/lib/python3.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/proj/production/bin/miniconda3/envs/P_main/lib/python3.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/proj/production/bin/miniconda3/envs/P_main/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/proj/production/bin/miniconda3/envs/P_main/lib/python3.7/site-packages/housekeeper/cli/add.py", line 124, in file_cmd
link_to_relative_path(version=version, file_path=file_path, root_path=context.obj[ROOT])
File "/home/proj/production/bin/miniconda3/envs/P_main/lib/python3.7/site-packages/housekeeper/include.py", line 63, in link_to_relative_path
link_file(file_path=file_path, new_path=housekeeper_path, hardlink=True)
File "/home/proj/production/bin/miniconda3/envs/P_main/lib/python3.7/site-packages/housekeeper/include.py", line 19, in link_file
os.link(file_path.resolve(), new_path)
FileExistsError: [Errno 17] File exists: '/home/proj/production/housekeeper-bundles/ADM1091A3/2018-06-05/ADM1091A3_L001_R1_001.fastq.gz' -> '/home/proj/production/housekeeper-bundles/ADM1091A3/2018-06-05/ADM1091A3_L001_R1_001.fastq.gz'

@beatrizsavinhas
Copy link

beatrizsavinhas commented Jul 24, 2023

Though I agree that ideally we should avoid manually modifying the database, I have also found this issue and wondered if having a --force or --skip-hard-linking flag would be useful when doing manual work.
The problem arose when manually processing old flow cells stored on disk but with missing files in the housekeeper bundle or filtering vcf files from balsamic cases that fail for having too many variants as it is often more straightforward to find the necessary input files already in the housekeeper bundle. I found a workaround by moving or generating the files on a different directory and then adding them to housekeeper.

@Vince-janv
Copy link
Contributor

Suggested solution: Before hard linking the file, check if there is a file present. If so use Path.samefile() to compare them. If True, only add to the Database. This might be a bit cumbersome but avoids the problem of overwriting anything already in the bundle directory.

@beatrizsavinhas
Copy link

Note: With the current implementation, one can already achieve this by using the flag --keep-input-path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants