Skip to content

Commit

Permalink
Update README, demo, setup.py and adjust program output fields
Browse files Browse the repository at this point in the history
  • Loading branch information
xiao-chen-xc committed Jan 6, 2022
1 parent 69fb52b commit 371a6d7
Show file tree
Hide file tree
Showing 4 changed files with 57 additions and 22 deletions.
42 changes: 25 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,42 @@

Gauchian is a targeted variant caller for the GBA gene based on a whole-genome sequencing (WGS) BAM file. Gauchian uses a novel method to solve the problems caused by the high sequence similarity with the pseudogene paralog GBAP1 and is able to detect variants accurately in the Exons 9-11 homology region, such as large deletions or duplications between GBA and GBAP1, and GBAP1-like variants in GBA, including p.A495P, p.L483P, p.D448H, c.1263del, RecNciI, RecTL and c.1263del+RecTL. In addition to these challenging variants, Gauchian also calls known pathogenic or likely pathogenic GBA variants classified in ClinVar. Gauchian has been tested on Illumina WGS data with standard sequencing depth (>=30X). Gauchian does not work on targeted sequencing data. Please refer to our [preprint](https://www.medrxiv.org/content/10.1101/2021.11.12.21266253v1) for more details about the method.

## Running the program
## Installation

```bash
git clone https://github.com/Illumina/Gauchian
cd Gauchian
python3 setup.py install
```

This Python3 program can be run as follows:
## Running the program

```bash
python -m gauchian --manifest MANIFEST_FILE \
--genome [19/37/38] \
--prefix OUTPUT_FILE_PREFIX \
--outDir OUTPUT_DIRECTORY \
--threads NUMBER_THREADS
gauchian --manifest MANIFEST_FILE \
--genome [19/37/38] \
--prefix OUTPUT_FILE_PREFIX \
--outDir OUTPUT_DIRECTORY \
--threads NUMBER_THREADS
```

The manifest is a text file in which each line should list the absolute path to an input WGS BAM/CRAM file. Full WGS BAM/CRAM files are recommended. If you would like to use a subsetted bamlet, please subset using region files in gauchian/data/GBA_region_*.bed. For CRAM input, it’s suggested to provide the path to the reference fasta file with `--reference` in the command.
The manifest is a text file in which each line should list the absolute path to an input WGS BAM/CRAM file. Full WGS BAM/CRAM files are recommended. If you would like to use a subsetted bamlet, please subset using region files in gauchian/data/GBA_region_*.bed.

For CRAM input, it’s suggested to provide the path to the reference fasta file with `--reference` in the command.

## Interpreting the output

The program produces a .tsv file in the directory specified by --outDir.
The fields are explained below:

| Fields in tsv | Explanation |
|:----------------------------------------|:-------------------------------------------------------------------------------|
| Sample | Sample name |
| is_biallelic_GBAP1-like_variant_exon9-11| Whether the sample is called as biallelic for GBAP1-like variants in exon9-11 |
| is_carrier_GBAP1-like_variant_exon9-11 | Whether the sample is called as a carrier for GBAP1-like variants in exon9-11 |
| total_CN | Total copy number of GBA+GBAP1 |
| deletion_breakpoint_in_GBA_gene | Whether the deletion breakpoint is in GBA gene if a deletion exists |
| GBAP1-like_variant_exon9-11 | GBAP1-like variants called in exon9-11, two alleles separated by / |
| other_variants | Other variants called (non-GBAP1-like variants or variants outside of exon9-11)|
| Fields in tsv | Explanation |
|:-----------------------------------------|:-------------------------------------------------------------------------------|
| Sample | Sample name |
| is_biallelic(GBAP1-like_variant_exon9-11)| Whether the sample is called as biallelic for GBAP1-like variants in exon9-11 |
| is_carrier(GBAP1-like_variant_exon9-11) | Whether the sample is called as a carrier for GBAP1-like variants in exon9-11 |
| CN(GBA+GBAP1) | Total copy number of GBA+GBAP1 |
| deletion_breakpoint_in_GBA | Whether the deletion breakpoint is in GBA gene if a deletion exists |
| GBAP1-like_variant_exon9-11 | GBAP1-like variants called in exon9-11, two alleles separated by / |
| other_unphased_variants | Other variants called (non-GBAP1-like variants or variants outside of exon9-11)|

A .json file is also produced that contains more information about each sample.

Expand Down
24 changes: 24 additions & 0 deletions docs/demo.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
============
Program Demo
============

1. Download a WGS bam file::

$ wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/1000G_2504_high_coverage/data/ERR3239883/NA20815.final.cram
$ wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/1000G_2504_high_coverage/data/ERR3239883/NA20815.final.cram.crai

2. Make a manifest file::

$ echo "$PWD/NA20815.final.cram" > manifest.txt

3. Run Gauchian, which takes about one to two minutes with single thread::

$ gauchian -m manifest.txt -g 38 -o out -p testgba

4. Check the output file out/testgba.tsv

============= ========================================= ======================================= ============= ========================== =========================== =======================
Sample is_biallelic(GBAP1-like_variant_exon9-11) is_carrier(GBAP1-like_variant_exon9-11) CN(GBA+GBAP1) deletion_breakpoint_in_GBA GBAP1-like_variant_exon9-11 other_unphased_variants
============= ========================================= ======================================= ============= ========================== =========================== =======================
NA20815.final False True 4 N/A L483P/ None
============= ========================================= ======================================= ============= ========================== =========================== =======================
12 changes: 7 additions & 5 deletions gauchian/gauchian.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,12 +159,12 @@ def write_to_tsv(final_output, out_tsv):
"""Prepare tsv output"""
header = [
"Sample",
"is_biallelic_GBAP1-like_variant_exon9-11",
"is_carrier_GBAP1-like_variant_exon9-11",
"total_CN",
"deletion_breakpoint_in_GBA_gene",
"is_biallelic(GBAP1-like_variant_exon9-11)",
"is_carrier(GBAP1-like_variant_exon9-11)",
"CN(GBA+GBAP1)",
"deletion_breakpoint_in_GBA",
"GBAP1-like_variant_exon9-11",
"other_variants"
"other_unphased_variants"
]
with open(out_tsv, "w") as tsv_output:
tsv_output.write("\t".join(header) + "\n")
Expand All @@ -186,6 +186,8 @@ def write_to_tsv(final_output, out_tsv):
p1like_variants,
other_variants
]
if output_per_sample[4] is None:
output_per_sample[4] = "N/A"
tsv_output.write("\t".join([str(a) for a in output_per_sample]) + "\n")
else:
tsv_output.write("\t".join([str(a) for a in [sample_id]+[None]*6]) + "\n")
Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ def readme():
author_email='[email protected]',
license='GPLv3',
packages=['gauchian', 'gauchian.caller', 'gauchian.depth_calling'],
package_data={'gauchian': ['data/*']},
install_requires=['pysam', 'numpy', 'scipy', 'statsmodels'],
setup_requires=['pytest-runner'],
tests_require=['pytest'],
Expand Down

0 comments on commit 371a6d7

Please sign in to comment.