Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set up the preprocessing part of the chromosome #86

Open
qiqi277 opened this issue Nov 7, 2024 · 4 comments
Open

How to set up the preprocessing part of the chromosome #86

qiqi277 opened this issue Nov 7, 2024 · 4 comments

Comments

@qiqi277
Copy link

qiqi277 commented Nov 7, 2024

Hello, thank you very much for developing the Monopogen software.
I have encountered some problems in the running process. I run the pig sample, but when I run the preprocessing part, it shows an error because I can't find chromosome 19. What can I do to make it work with only the first 18 chromosomes? Here is my error message.

`multiprocessing.pool.RemoteTraceback: 
Traceback (most recent call last):
  File "/public/home/zhangqiqi02/miniconda3/envs/momopogen2_env/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/public/home/zhangqiqi02/miniconda3/envs/momopogen2_env/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/public/home/zhangqiqi02/software/Monopogen/src/germline.py", line 200, in BamFilter
    for s in infile.fetch(search_chr):
  File "pysam/libcalignmentfile.pyx", line 1092, in pysam.libcalignmentfile.AlignmentFile.fetch
  File "pysam/libchtslib.pyx", line 683, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig `19`

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/public/home/zhangqiqi02/software/Monopogen/src/Monopogen.py", line 340, in <module>
    main()
  File "/public/home/zhangqiqi02/software/Monopogen/src/Monopogen.py", line 333, in main
    args.func(args)
  File "/public/home/zhangqiqi02/software/Monopogen/src/Monopogen.py", line 221, in preProcess
    result = pool.map(BamFilter, para_lst)
  File "/public/home/zhangqiqi02/miniconda3/envs/momopogen2_env/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/public/home/zhangqiqi02/miniconda3/envs/momopogen2_env/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
ValueError: invalid contig `19`

`
Looking forward to your reply!

@ZiyiWang7
Copy link

Hi @qiqi277,

Sorry for the late reply, and thank you for your interest in our package!

Currently, Monopogen can only process human samples since it uses the human reference for imputation. However, if you'd like to adapt the framework for pig samples with 18 chromosomes, you can modify the code in the Monopogen.py file by changing the for loops on lines 213 and 225 to:
for chr in range(1, 19):
This will allow the tool to process chromosomes 1 to 18.

And for the subsequent imputation steps, you will need to update the code to use the pig sample reference.
The relevant code snippet can be found in the Monopogen.py file at line 96. Currently, the reference file name is formatted as:
imputation_vcf = args.imputation_panel + "CCDG_14151_B01_GRM_WGS_2020-08-05_" + record[0] + ".filtered.shapeit2-duohmm-phased.vcf.gz"
You will need to modify this line to match the naming convention of your specific reference file.

@qiqi277
Copy link
Author

qiqi277 commented Dec 20, 2024

Hi @qiqi277,

Sorry for the late reply, and thank you for your interest in our package!

Currently, Monopogen can only process human samples since it uses the human reference for imputation. However, if you'd like to adapt the framework for pig samples with 18 chromosomes, you can modify the code in the Monopogen.py file by changing the for loops on lines 213 and 225 to: This will allow the tool to process chromosomes 1 to 18.for chr in range(1, 19):

And for the subsequent imputation steps, you will need to update the code to use the pig sample reference. The relevant code snippet can be found in the Monopogen.py file at line 96. Currently, the reference file name is formatted as: You will need to modify this line to match the naming convention of your specific reference file.imputation_vcf = args.imputation_panel + "CCDG_14151_B01_GRM_WGS_2020-08-05_" + record[0] + ".filtered.shapeit2-duohmm-phased.vcf.gz"

Hello, thank you very much for your reply.
Again, is Monopogen only suitable for human data? I have tried your method on pig samples, but the results are not nearly as good as the results obtained from human data.

@ZiyiWang7
Copy link

Yes, currently Monopogen can only process human data. That is because we use 1KG3 dataset which is a human genetic variation catalog as the imputation reference. As I mentioned in the second part of my comment, you would need to use a pig-specific panel for your samples.

@qiqi277
Copy link
Author

qiqi277 commented Dec 21, 2024

Yes, currently Monopogen can only process human data. That is because we use 1KG3 dataset which is a human genetic variation catalog as the imputation reference. As I mentioned in the second part of my comment, you would need to use a pig-specific panel for your samples.

May I ask if there are any special requirements for this imputation reference? I used a panel containing 2337 pig samples, which contains snp and sv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants