Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about rCRSd and human_repeats #203

Open
hiroyukikato911 opened this issue Oct 8, 2021 · 1 comment
Open

Question about rCRSd and human_repeats #203

hiroyukikato911 opened this issue Oct 8, 2021 · 1 comment

Comments

@hiroyukikato911
Copy link

Hi,

Previous version of PEPATAC pipeline denoted both rCRSd and human_repeats as prealignments.
prealignments: ['rCRSd', 'human_repeats']

as we can see from the following example.
http://pepatac.databio.org/en/latest/files/examples/tutorial/results_pipeline/tutorial2/PEPATAC_log.txt

However, the tutorial of the latest version only notes about rCRSd as shown here.
http://pepatac.databio.org/en/latest/run-container/

refgenie pull rCRSd/bowtie2_index

--prealignment-index rCRSd=default/94e0d21feb576e6af61cd2a798ad30682ef2428bb7eabbb4 \

What should I do with human repeats?
I would appreciate it if you could let me know:

  1. Example code to pull from refgenie.

  2. How to designate human repeats when running pepatac.py.

  3. Also, I would appreciate it if you could clarify what would be the mm10 counterpart for rCRSd and human_repeats.
    (Is mouse_chrm2x sufficient?)

Finally, pypline version of example code is 0.9.6.
I think it would be helpful to every one if you could renew or also upload the example for Pipeline version: 0.10.0.
http://pepatac.databio.org/en/latest/files/examples/tutorial/results_pipeline/tutorial2/PEPATAC_log.txt

I'm a fan of this pipeline and I really appreciate your help.

Best regards,
Hiroyuki

@jpsmith5
Copy link
Contributor

Hey @hiroyukikato911,

Yeah, I had just simplified the example but I internally always include human_repeats and it's a simple thing to keep including it on your end too.

  1. To grab the asset with refgenie:
refgenie pull human_repeats/fasta human_repeats/bowtie2_index
  1. An example using the test sample:
name: test_project

pep_version: 2.0.0
sample_table: test_annotation.csv

looper: 
  output_dir: pepatac_test/issue_203
  pipeline_interfaces: ../../project_pipeline_interface.yaml 

sample_modifiers:
  append:
    pipeline_interfaces: ../../sample_pipeline_interface.yaml
  derive:
    attributes: [read1, read2]
    sources:
      test_data_R1: "examples/data/{sample_name}_r1.fastq.gz"
      test_data_R2: "examples/data/{sample_name}_r2.fastq.gz"
  imply:
    - if: 
        organism: ["human", "Homo sapiens", "Human", "Homo_sapiens"]
      then: 
        genome: hg38
        prealignment_names: ["rCRSd", "human_repeats"]

Could save that in the pepatac/examples/test_project/ folder as, for example, test_config_refgenie2.yaml. Then from the parent pepatac/ direcotry run as: looper run examples/test_project/test_config_refgenie2.yaml. Be aware that in the current released version the prealignments flag is now prealignment_names due to the fact that the pipeline no longer explicitly requires refgenie, although that is still our recommended usage.

  1. For mouse data, yes the mouse_chrM2x genome is the mitochondrial equivalent. We don't currently have a standard mouse repeat or satellite genome hosted on refgenie. If you had something local you wished to use in this context, we could walk you through the process of building a refgenie asset locally to represent the "mouse_repeats" type genome that you could then include.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants