Merge pull request #8 from nygenome/web_docs

chore: bringing website back into repo
nygenome · Sep 27, 2023 · 4c36d7a · 4c36d7a
2 parents f15caca + 2182302
commit 4c36d7a
Show file tree

Hide file tree

Showing 35 changed files with 22,883 additions and 0 deletions.
diff --git a/.github/workflows/github-action-deploy.yml b/.github/workflows/github-action-deploy.yml
@@ -0,0 +1,34 @@
+defaults:
+  run:
+    working-directory: website
+
+name: Update Website
+on:
+  push:
+    branches:
+      - main
+    paths:
+      - website/**
+
+jobs:
+  Update-Website:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - run: git pull
+      - uses: actions/setup-node@v3
+        with:
+          node-version: 18
+          cache: npm
+          cache-dependency-path: website/package-lock.json
+      - name: Install dependencies
+        run: npm ci
+      - name: Build website
+        run: npm run build
+      - name: Deploy to GitHub Pages
+        uses: peaceiris/actions-gh-pages@v3
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_dir: ./website/build
+          user_name: zhubry
+          user_email: [email protected]
diff --git a/website/.gitignore b/website/.gitignore
@@ -0,0 +1 @@
+/node_modules
diff --git a/website/README.md b/website/README.md
@@ -0,0 +1,41 @@
+# Website
+
+This website is built using [Docusaurus 2](https://docusaurus.io/), a modern static website generator.
+
+### Installation
+
+```
+$ yarn
+```
+
+### Local Development
+
+```
+$ yarn start
+```
+
+This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
+
+### Build
+
+```
+$ yarn build
+```
+
+This command generates static content into the `build` directory and can be served using any static contents hosting service.
+
+### Deployment
+
+Using SSH:
+
+```
+$ USE_SSH=true yarn deploy
+```
+
+Not using SSH:
+
+```
+$ GIT_USER=<Your GitHub username> yarn deploy
+```
+
+If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.
diff --git a/website/babel.config.js b/website/babel.config.js
@@ -0,0 +1,3 @@
+module.exports = {
+  presets: [require.resolve('@docusaurus/core/lib/babel/preset')],
+};
diff --git a/website/blog/2021-08-26-welcome/docusaurus-plushie-banner.jpeg b/website/blog/2021-08-26-welcome/docusaurus-plushie-banner.jpeg
diff --git a/website/blog/2021-08-26-welcome/index.md b/website/blog/2021-08-26-welcome/index.md
@@ -0,0 +1,7 @@
+---
+slug: welcome
+title: Welcome
+authors: [gnarzisi, rmusunuri, bzhu]
+---
+
+This is where blog posts will be made
diff --git a/website/blog/authors.yml b/website/blog/authors.yml
@@ -0,0 +1,11 @@
+rmusunuri:
+  name: Rajeeva Lochan Musunuri
+  title: Data Scientist
+
+gnarzisi:
+  name: Giuseppe Narzisi
+  title: Project Lead
+
+bzhu:
+  name: Bryan Zhu
+  title: Bioinformatics Programmer
diff --git a/website/docs/Guides/Exome.md b/website/docs/Guides/Exome.md
@@ -0,0 +1,23 @@
+# Targeted Scan
+
+Lancet can be run on exome/panel data by providing a preselected list of regions along the genome with a BED file. The BED file is a tab delimited text file which must contain at least three columns in the following order: the first column indicates the chromosome of the desired region to analyze and the second and third columns are the start and end positions of the region respectively.
+
+sample.bed:
+
+```bash
+1   56091000    56092000
+5   37281200    37291200
+8   11200000    11300000
+```
+
+In the example above the ```sample.bed``` file can be used to call variants in chromosomes 1, 5, and 8 from positions 56091000-56092000, 37281200-37291200, and 11200000-11300000 respectively using the following command:
+
+```bash
+lancet2 pipeline -t tumor.bam -n normal.bam -r ref.fasta -o out.vcf -b sample.bed
+```
+
+:::note
+
+Chromosome labels in the BED file must match the chromosome labels present in the genome reference and BAM files.
+
+:::
diff --git a/website/docs/Guides/GWS.md b/website/docs/Guides/GWS.md
@@ -0,0 +1,21 @@
+# Genome-Wide Scan
+
+For whole-genome sequencing studies it is highly recommended to split the analysis by chromosome and then merge the results. Splitting the work by chromosome will reduce overall runtime and memory requirements to analyze whole-genome data.
+
+```bash
+NUMBER_OF_AUTOSOMES=22
+for chrom in `seq 1 $NUMBER_OF_AUTOSOMES` X Y; do
+ qsub \
+ -N lancet_chr${chrom} \
+ -cwd \
+ -pe smp 8 \
+ -q dev.q \
+ -j y \
+ -b y \
+ "lancet2 -t T.bam -n N.bam -r ref.fa --region $chrom --num-threads 8 -o ${chrom}_out"
+done
+
+// merge VCF files
+```
+
+The previous command shows an example submission of multiple parallel lancet jobs, one for each human chromosome, to the Sun Grid Engine queuing system.
diff --git a/website/docs/Guides/Visualization.md b/website/docs/Guides/Visualization.md
@@ -0,0 +1,38 @@
+# Graph Visualization
+
+By passing the ```--graphs-dir``` parameter and providing a directory where you want the graphs to be placed, Lancet will write deBruijn graphs for each window inspected and place them into the directory as dot files. NOTE: whichever directory is given as the graphs-dir will be cleared so be mindful of what directory you provide.
+
+```bash
+lancet2 pipeline -t tumor.bam -n normal.bam -r ref.fasta --graphs-dir ./dot_graphs_dir
+```
+
+The above command will export the DeBruijn graph at various stages of the assembly (low coverage removal, graph compression and traversal) to the following set of files:
+
+1. chr:start-end_c0_raw_graph.dot
+2. chr:start-end_cX_before_compression.dot
+3. chr:start-end_cX_after_compression.dot
+4. chr:start-end_cX_path_flow.dot
+
+Where X refers to the connected component within the graph (in most cases only one).
+
+These files can be rendered using the dot utility available in the [Graphviz](http://www.graphviz.org/) visualization software package.
+
+```bash
+dot -Tpdf -o example_file.pdf example_file.dot
+```
+
+The above command will create a example_file.pdf file that shows the graph. For large graphs, Adobe Acrobat Reader may have troubles rendering the graph in which case we recommend opening the PDF file using the "Preview" image viewer software available in MacOS.
+
+Below is an example of what the generated graphs may look like. The blue nodes are k-mers shared by both tumor and normal; the white nodes are k-mer with low support (likely sequencing errors); the green nodes are k-mers only present in the normal; the red nodes are k-mers only present in the tumor.:
+
+The first image below shows the raw graph before removing the low coverage nodes.
+![raw_graph](../../static/img/chr14_72547800-72548098_c0_raw_graph.png)
+
+The second image below shows the graph before compression and tip removal.
+![before_compression](../../static/img/chr14_72547800-72548098_c1_before_compression.png)
+
+The third image below shows the graph after compression and tip removal.
+![after_compression](../../static/img/chr14_72547800-72548098_c1_after_compression.png)
+
+The fourth image below highlights all the assembly path flows taken through the graph.
+![path_flow](../../static/img/chr14_72547800-72548098_c1_path_flow.png)
diff --git a/website/docs/acknowledgment.md b/website/docs/acknowledgment.md
@@ -0,0 +1,8 @@
+---
+id: acknowledgment
+title: Acknowledgments
+---
+
+## Funding
+
+Informatics Technology for Cancer Research ([ITCR](https://itcr.cancer.gov)) under the NCI [U01 award 1U01CA253405-01A1](https://reporter.nih.gov/project-details/10304730).
diff --git a/website/docs/cli.md b/website/docs/cli.md
@@ -0,0 +1,202 @@
+---
+id: cli
+title: Command Line Reference
+---
+## pipeline
+This is the subcommand that will kick off the tool. This will always follow directly after the call to the Lancet executable
+
+```bash
+lancet2 pipeline --help
+lancet2 pipeline -t /path/to/tumor.bam -n /path/to/normal.bam -r /path/to/ref.fasta -o /path/to/out_prefix
+```
+
+## General Options:
+### `--help`
+Bring up the standard help output. This will give examples of options the user can change to customize a run.
+
+### `--version`
+Print the version information for the build
+
+## Required Arguments:
+A standard run of Lancet will provide a tumor bam file, a normal bam file, the reference fasta, and an output path to place the outputted vcf file.
+```shell
+./lancet pipeline -t /path/to/tumor.bam -n /path/to/normal.bam -r /path/to/ref.fasta -o /path/to/out.vcf
+```
+
+### `-t`, `--tumor`
+Provide the path to the tumor bam file. Index for this file should also be in the same directory
+
+### `-n`, `--normal`
+Provide the path to the normal bam file. Index for this file should also be in the same directory
+
+### `-r`, `--reference`
+Provide the path to the reference fasta file. Index for this file should also be in the same directory
+
+### `-o`, `--out-prefix`
+Prefix to use for output VCF (will be bgzipped and indexed)
+
+## Optional Arguments:
+These arguments allow for more fine-tuned control of the tool. If not provided, default values will be assigned
+### `--graphs-dir`
+This tag allows you to define the output path for dumping serialized graphs from a run. If this option is not utilized, there will be no outputted graphs.
+
+### Regions
+These options will allow you to play around with what the tool looks at.
+
+### `--region`
+Allows the user to define what region the tool should run on. Should be of format chr:start_pos-end_pos.
+For example, this will indicate that the tool will run on chromosome 2 between positions 33091000 and 33092000:
+```shell
+... --region 2:33091000-33092000 ...
+```
+If no region(s) specified, the tool will default to run on the whole genome provided
+
+### `-b`, `--bed-file`
+Path to bed file that will be used to define which region the tool will run on. Bed file should be a tab delimited file in which the first column
+is the chromosome, the second column is the start position, and the third column is the end position. For example, a bed file like this specifies chromosomes
+1 and 2 between positions 4 and 239410 and 892 and 1029348 respectively:
+```shell
+1	4	239410
+2	892	1029348
+```
+If no region(s) specified, the tool will default to run on the whole genome provided
+
+### `-P`, `--padding`
+Padding to be applied to all input regions. By default, there will be a padding of 250 bp.
+
+### `-w`, `--window-size`
+This tag allows you to define the window size used for each microassembly task. By default, the window will be 600 bp.
+
+### `-T`, `--num-threads`
+Allows the user to specify the number of cores to be used for the tool.
+
+### `--pct-overlap`
+Allows the user to define how much overlap there should be between windows. If not specified, the tool will default to 50% overlap
+
+### Parameters
+These options allow you to define certain parameters for how the tool performs its variant calling
+
+### `-T`, `--num-threads`
+This allows you to define how many threads are used by the tool. If not specified, the tool will default to running with 1 thread.
+
+### `-k`, `--min-kmer-length`
+This allows you to define the minimum length kmers should be for graph nodes. If no length specified, the min kmer length defaults to 11 bp.
+
+### `-K`, `--max-kmer-length`
+This allows you to define the maximum length kmers should be for graph nodes. If no length specified, the max kmer length defaults to 101 bp.
+
+### `--min-trim-qual`
+This allows you to define the minimum base quality for trimming 5' and 3' read bases. If this option is not used, the minimum base quality will be defaulted to 10.
+
+### `-q`, `--min-base-qual`
+This allows you to define the minimum base quality for SNV calling. If this option is not used, the minimum base quality will be defaulted to 17.
+
+### `-Q`, `--min-mapping-qual`
+This allows you to define the minimum mapping quality required to use a read. By default, this value will be set to 15.
+
+### `--max-rpt-mismatch`
+This allows you to define the maximum number of mismatches used to detect approximate repeats. By default, the maximum number of mismatches is set to 2.
+
+### `--max-tip-length`
+This will define the maximum tip length allowed in the genereated graphs. By default, the maximum tip length is set to 11.
+
+### `--graph-traversal-limit`
+Maximum allowed tip length in the graph. By default, this value is set to 11.
+Check cli.cpp, why is params->minGraphTipLength
+
+### `--graph-traversal-limit`
+Max BFS/DFS graph traversal limit. By default, this value is set to 100000.
+
+### `--max-indel-length`
+Maximum limit on the indel length to detect. Default is set to 500.
+
+### `--min-anchor-cov`
+Minimum coverage for anchor nodes (source & sink). Default is 5
+
+### `--min-node-cov`
+Minimum coverage for all nodes in the graph. Default is 1
+
+### `--min-cov-ratio`
+Minimum node by window coverage for all nodes. Default is 0.01
+Node to window coverage ratio?
+
+### `--max-window-cov`
+Maximum average window coverage before downsampling. Default is 1000
+
+### `--min-as-xs-diff`
+Minimum difference between AS and XS scores (BWA-mem). Default is 5
+
+### STR Parameters
+Use these flags to deal with Short Tandem Repeats
+
+### `--max-str-unit-len`
+Maximum unit length for an STR motif. Default is 4
+
+### `--min-str-units`
+Minimum number of STR units required to report. Default is 7
+
+### `--maxSTRDist`
+Maximum distance (in bp) of variant from the STR motif. Default is 1
+
+### Filters
+These options let you apply different filters to apply to the variant caller
+
+### `-c`, `--max-nml-alt-cnt`
+Maximum ALT allele count in normal sample. Default is set at 0
+
+### `-C`, `--min-tmr-alt-cnt`
+Minimum ALT allele count in tumor sample. Default is 3
+
+### `-v`, `--max-nml-vaf`
+Maximum variant allele frequency in normal sample. Default is 0
+
+### `-V`, `--min-tmr-vaf`
+Minimum variant allele frequency in tumor sample. Default is 0.01
+
+### `--min-nml-cov`
+Minimum variant coverage in the normal sample. Default is 10
+
+### `--min-tmr-cov`
+Minimum variant coverage in the tumor sample. Default is 4
+
+### `--max-nml-cov`
+Maximum variant coverage in the normal sample. Default is 1000
+
+### `--max-tmr-cov`
+Maximum variant coverage in the tumor sample. Default is 1000
+
+### `--min-fisher`
+Minimum phred scaled score for somatic variants. Default is 5
+
+### `--min-str-fisher`
+Minimum phred scaled score for STR variants. Default is 25
+
+### `--min-strand-cnt`
+Minimum per strand contribution for a variant. Default is 1
+
+### Feature Flags
+Use these flags to toggle certain portions of the code to apply different features. By default, these features are off, but by using tags (with no argument following), you can toggle the feature on.
+
+### `--verbose`
+Turn on verbose logging for more detailed messages
+
+### `--tenx-mode`
+Run Lnacet in 10X Linked Reads mode
+
+### `--active-region-off`
+Turn off active region detection
+
+### `--kmer-recovery-on`
+Turn on experimental kmer recovery
+
+### `--xa-filter`
+Skip reads with XA tag (BWA-mem only)
+
+### `--skip-secondary`
+Skip secondary read alignments
+
+### `--extract-pairs`
+Extract read pairs for each window
+
+### `--no-contig-check`
+Skip checking for same contigs in BAM/CRAMs and reference