-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgetting_started_cli.qmd
486 lines (356 loc) · 18.1 KB
/
getting_started_cli.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
---
title: "Getting Started with OpenCRAVAT on the Command Line"
format:
html: default
rst: default
toc: true
anchor-sections: true
---
## Learning Objectives
Working through this document, you will learn to:
- **Prepare** your system for installing OpenCRAVAT
- **Install** the OpenCRAVAT software locally using `pip`
- **Search** for and **Install** available annotators
- **Explain** compatible variant formats
- **Annotate** variant files using the CLI tools
- **Visualize** and **Summarize** Results in OpenCRAVAT
## The Basic OpenCRAVAT Workflow
In the diagram below, we'll see the basic OpenCRAVAT workflow on the command-line. Click the boxes below in the diagram to jump to that particular section.
```{mermaid}
flowchart TD
L[Prepare for Installing] --> A
click L "#preparing-and-installing-opencravat"
A[Installation] --> G[Install Annotators]
click A "#installing-opencravat"
click G "#installing-annotators"
G --> B[Convert to Input File Format]
click B "#understanding-the-input-file-format"
B -->|Variant File Input| C[Map Variants and Annotate]
click C "#annotating-our-example"
C -->|Results|D[Start Results Viewer]
click D "#starting-the-results-viewer"
D -->|Results| E[Filter Results]
click E "#examining-our-results-file-and-filtering"
E -->|Results| F[Visualizing Filtered Results]
click F "#visualizing-our-filtered-results"
```
## Preparing and Installing OpenCRAVAT
{{< video https://youtu.be/24awSa0ry9M >}}
Make sure you know where your Python is installed using the `which` command. In my example, I have python 3.11 installed via Homebrew, which is the python I want to use. In my case, since I'm on MacOS, I know that I run python programs using `python3` rather than `python`.
```bash
which python3
```
```
tedladeras@teds-MacBook-Pro ~ % which python3
/opt/homebrew/bin/python3
```
Also, check whether you have `pip` or `pip3` installed, and whether it has a similar location to your `python3`.
```bash
which pip3
```
```
tedladeras@teds-MacBook-Pro ~ % which pip3
/opt/homebrew/bin/pip3
```
### Using a Virtual Environment
You should create a virtual environment so that your OpenCRAVAT installation is isolated from other installations. We'll use `venv`, which installed in the default Python distribution.
We'll make a separate virtual environment using the `venv` command. Here we're creating a virtual environment called `oc`:
```bash
python3 -m venv oc
```
This creates a folder called `oc` in our current directory. This is where all of our separate Python packages will live. Then we'll activate the virtual environment using `source`:
```bash
source oc/bin/activate
```
You can double check whether the environment is activated by using `which python3` again. It should poiint to the `bin` folder within your virtual environment folder.. Note that our prompt also has an `(oc)` in front.
```bash
which python3
```
```
(oc) tedladeras@teds-MacBook-Pro ~ % which python3
/Users/tedladeras/oc/bin/python3
```
### Installing OpenCRAVAT
Now we can install OpenCRAVAT using `pip`/`pip3`.
```bash
pip3 install open-cravat
```
```
tedladeras@teds-MacBook-Pro local % pip3 install open-cravat
Collecting open-cravat
Downloading open-cravat-2.4.2.tar.gz (3.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 40.4 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting pyyaml (from open-cravat)
Downloading PyYAML-6.0.1-cp311-cp311-macosx_11_0_arm64.whl.metadata (2.1 kB)
Collecting requests (from open-cravat)
Downloading requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting requests-toolbelt (from open-cravat)
Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)
```
Confirm that OpenCRAVAT is installed:
```bash
oc --help
```
```
tedladeras@teds-MacBook-Pro ~ % oc --help
usage: oc [-h] {run,report,gui,module,config,new,store,util,version,feedback} ...
Open-CRAVAT genomic variant interpreter. https://github.com/KarchinLab/open-cravat
options:
-h, --help show this help message and exit
Commands:
{run,report,gui,module,config,new,store,util,version,feedback}
run Run a job
report Generate a report from a job
gui Start the GUI
module Change installed modules
config View and change configuration settings
new Create new modules
store Publish modules to the store
util Utilities
version Show version
feedback Send feedback to the developers
```
All of our interactions with OpenCRAVAT will be prefaced by `oc`. For example, we can launch the OpenCRAVAT GUI as a webserver on our machine using
```bash
oc gui
```
Now you're ready to start installing annotators.
## Installing Via Bioconda
If you prefer to install via `conda`/`mamba`, here are directions for you. You will want to install `mamba` via `miniforge`: download the installation scripts here.
When `mamba` has been installed, you'll need to create a conda environment and install openCRAVAT via a single command:
```bash
mamba create --name oc open-cravat
```
This will create an environment called `oc`
When you're ready to use OpenCRAVAT, you can activate this `oc` environment:
```bash
mamba activate oc
```
Confirm that you can see the `oc` executable with `which`. It should be where you installed miniforge:
```bash
which oc
```
```
/Users/tedladeras/miniforge3/envs/oc/bin/oc
```
Now you can use OpenCRAVAT as below and install annotators.
## Installing Annotators
{{< video https://youtu.be/N6cPmt1kNaU >}}
The first thing we'll need to install are some core bits of OpenCRAVAT, called `install-base`. We'll do this with the command `oc module`:
```bash
oc module install-base
```
```
tedladeras@teds-MacBook-Pro local % oc module install-base
Installing: casecontrol:1.2.0, cravat-converter:1.1.2, excelreporter:2.1.1, go:2022.11.01, hg38:1.11.0, hg38wgs:1.0.0, oldcravat-converter:1.1.2, tagsampler:1.1.6, textreporter:2.1.0, varmeta:1.0.0, vcf-converter:2.2.1, vcfinfo:2.0.0, wgbase:1.1.3, wgcasecontrols:1.0.1, wgcasecontrolsummary:1.0.1, wgcircossummary:2.2.0, wgcodingvsnoncodingsummary:2.0.0, wggo:1.2.0, wggosummary:2.4.0, wghg19:1.0.3, wglollipop:2.2.1, wgncrna:1.1.0, wgndex:1.1.0, wgnote:3.0.0, wgrankscore:1.1.0, wgsosamplesummary:2.2.0, wgsosummary:1.5.0, wgvcfinfo:1.0.3
[2024:01:31 14:17:48] Starting to install casecontrol:1.2.0...
[2024:01:31 14:17:48] Downloading code archive of casecontrol:1.2.0...
[**************************************************] 21.4 kB / 21.4 kB (100%)
[2024:01:31 14:17:49] Extracting code archive of casecontrol:1.2.0...
[2024:01:31 14:17:49] Verifying code integrity of casecontrol:1.2.0...
[2024:01:31 14:17:49] Finished installation of casecontrol:1.2.0
[2024:01:31 14:17:49] Starting to install cravat-converter:1.1.2...
[2024:01:31 14:17:49] Downloading code archive of cravat-converter:1.1.2...
....[intermediate output skipped]
[**************************************************] 670 B / 670 B (100%)
[2024:01:31 14:19:57] Extracting code archive of wgvcfinfo:1.0.3...
[2024:01:31 14:19:57] Verifying code integrity of wgvcfinfo:1.0.3...
[2024:01:31 14:19:57] Finished installation of wgvcfinfo:1.0.3
```
Let's list the available annotators. This is a very large list of annotators. This is just the first few entries.
```bash
tedladeras@teds-MacBook-Pro local % oc module ls -a -t annotator
```
```
Name Title Type Installed Store ver Store data ver Local ver Local data ver Size
abraom ABRaOM annotator 1.0.0 113.6 MB
alfa ALFA: Allele Frequency Aggregator annotator 1.0.0 2020.02.29 19.8 GB
alfa_african ALFA: Allele Frequency Aggregator African annotator 1.0.0 2020.02.29 23.2 GB
alfa_asian ALFA: Allele Frequency Aggregator Asian annotator 1.0.0 2020.02.29 24.1 GB
alfa_european ALFA: Allele Frequency Aggregator European annotator 1.0.0 2020.02.29 19.8 GB
alfa_latin_american ALFA: Allele Frequency Aggregator Latin American annotator 1.0.0 2020.02.29 20.3 GB
alfa_other ALFA: Allele Frequency Aggregator Others
....
```
We're actually looking for `ClinVar`, which is a list of clinically relevant annotations. Note that all of the annotators are in lower snake case (such as `alfa_asian`). So we can add `clinvar` in with our query.
```bash
oc module ls -a clinvar -t annotator
```
```
tedladeras@teds-MacBook-Pro local % oc module ls -a clinvar -t annotator
Name Title Type Installed Store ver Store data ver Local ver Local data ver Size
clinvar ClinVar annotator 2023.02.01 2023.02.01.1 381.8 MB
```
Ok, now we know our annotator exists, and we can install it with the `oc module install` command:
```bash
oc module install clinvar
```
We'll need to confirm `y` to proceed:
```
tedladeras@teds-MacBook-Pro local % oc module install clinvar
Installing: clinvar:2023.02.01, wgclinvar:1.1.1
Proceed? ([y]/n) > y
```
Then the installation will proceed:
```
[2024:01:31 14:25:08] Starting to install clinvar:2023.02.01...
[2024:01:31 14:25:08] Downloading code archive of clinvar:2023.02.01...
[**************************************************] 290.9 kB / 290.9 kB (100%)
[2024:01:31 14:25:09] Extracting code archive of clinvar:2023.02.01...
[2024:01:31 14:25:09] Verifying code integrity of clinvar:2023.02.01...
[2024:01:31 14:25:09] Downloading data of clinvar:2023.02.01...
[**************************************************] 49.0 MB / 49.0 MB (100%)
[2024:01:31 14:25:15] Extracting data of clinvar:2023.02.01...
[2024:01:31 14:25:15] Verifying data integrity of clinvar:2023.02.01...
[2024:01:31 14:25:16] Finished installation of clinvar:2023.02.01
[2024:01:31 14:25:16] Starting to install wgclinvar:1.1.1...
[2024:01:31 14:25:16] Downloading code archive of wgclinvar:1.1.1...
[**************************************************] 36.8 kB / 36.8 kB (100%)
[2024:01:31 14:25:17] Extracting code archive of wgclinvar:1.1.1...
[2024:01:31 14:25:17] Verifying code integrity of wgclinvar:1.1.1...
[2024:01:31 14:25:17] Finished installation of wgclinvar:1.1.1
```
## Understanding the Input File Format
We can generate an example file using `oc new example-input`. Note the period at the end, which means that we will generate the file in the current directory:
```bash
oc new example-input .
```
Let's confirm that we created this example:
```bash
ls -l example*
```
```
tedladeras@teds-MacBook-Pro ~ % ls -l example*
-rw-r--r-- 1 tedladeras staff 9036 Jan 31 14:27 example_input
```
Note the created file has an underscore (`_`) rather than a dash (`-`). Let's take a look at the `example_input` file that we created:
```bash
cat example_input | head
```
```
chr1 69091 + A C s0
chr1 69091 + ATG C s0
chr6 31039077 + C G s0
chr1 27612918 + G a s1
chr1 27612918 + G A s0
chrM 235 + A G clinvar
chrM 3308 + T C omim
chr8 54626835 + A T s0
chr4 1804372 + A G s1
chr4 1804372 + AT GC s1
chr4 1804372 + A T s1
```
## Annotating Our Example
{{< video https://youtu.be/gSeeDM9GUgQ >}}
Now we have our example, we can run OpenCRAVAT. This will annotate our `example_input` file with all available annotators.
```bash
oc run ./example_input -l hg38
```
```
tedladeras@teds-MacBook-Pro ~ % oc run ./example_input -l hg38
Input file(s): /Users/tedladeras/example_input
Genome assembly: hg38
Running converter...
Converter (converter) finished in 0.124s
Running gene mapper... finished in 2.668s
Running annotators...
annotator(s) finished in 1.466s
Running aggregator...
Variants finished in 0.010s
Genes finished in 0.003s
Samples finished in 0.022s
Tags finished in 0.025s
Indexing
variant base__coding finished in 0.000s
variant base__chrom finished in 0.000s
variant base__so finished in 0.000s
Running postaggregators...
Tag Sampler (tagsampler) finished in 0.008s
Finished normally. Runtime: 4.539s
```
## Starting the Results Viewer
{{< video https://youtu.be/cNDrAPhPffg >}}
We saw that one of the files generated was an `.sqlite` file. These are our results, which we can visualize using `oc gui`, which will launch the a web server so we can examine our results using the GUI:
```bash
oc gui example_input.sqlite
```
```
tedladeras@teds-MacBook-Pro ~ % oc gui example_input.sqlite
____ __________ ___ _ _____ ______
/ __ \____ ___ ____ / ____/ __ \/ | | / / |/_ __/
/ / / / __ \/ _ \/ __ \/ / / /_/ / /| | | / / /| | / /
/ /_/ / /_/ / __/ / / / /___/ _, _/ ___ | |/ / ___ |/ /
\____/ .___/\___/_/ /_/\____/_/ |_/_/ |_|___/_/ |_/_/
/_/
OpenCRAVAT is served at localhost:8080
(To quit: Press Ctrl-C or Ctrl-Break if run on a Terminal or Windows, or click "Cancel" and then "Quit" if run through OpenCRAVAT app on Mac OS)
(Getting result of [example_input.sqlite]:[variant]...)
Done getting result of [example_input.sqlite][variant] in 0.029s
(Getting result of [example_input.sqlite]:[gene]...)
Done getting result of [example_input.sqlite][gene] in 0.021s
```
A window should open in your web browser. If not, enter https://localhost:8080 to view the file.

## Examining Our Results File and Filtering
{{< video https://youtu.be/TYs3dGDFzQQ >}}
Now we take a look at our results in the web interface. Under the list of jobs, we can see our job. Let's select `Open Result Viewer` under the **Status** tab:

Keep in mind that the web interface is limited to visualizing 100,000 variants, so if you have a larger result file, you'll need to filter the results down. So let's take a look at how to filter our variants down.
We can filter variants by selecting the Filter tab in the Results viewer:

Under "Variant Properties" we can limit our list of variants to those that have ClinVar annotations. Let's build a filter using the Query Builder, which will allow us to impose multiple criteria as a filter.

We'll add a rule (a logical condition) to our filter using the `+` button:

Now we'll add a rule and select those that have `ClinVar` annotations. To do this, we'll first select a) `ClinVar` on the left, the b) `Clinical Significance` column, and c) `has data`:

Now we can apply this rule we've built by clicking on the **Apply Filter** button on the bottom right of the Query Builder:

How many variants are left after the filtering?
:::{.callout-note}
## Calculating the Effect of Filters
If you have multiple filters, you can actually precalculate the numbers of variants after filtering by using the icon below.

This can be helpful to check if your filters are too strict (that is, they won't return anything).
Just note that the filter is not actually applied to the data until you hit the **Apply Filter** Button.
:::
## Visualizing Our Filtered Results
{{< video https://youtu.be/q75mk2SxqTA >}}
Now that we've filtered, let's go back to the Summary Tab:

In the Summary tab, we can see information about the annotated variants, such as from the sequence ontology. We can get the counts within a sequence ontology category by mousing over that category in our plot:

These visualizations can be moved around and pinned. Using the camera icon, you can also save these visualizations.
Let's move over to the **Variant** tab and look for pathogenic variants. First, we'll click over to the **Variant** tab:

Scrolling to the right, we can see there is a column for the ClinVar annotations. Notice the **+** on the top right. We'll click that to expand the ClinVar annotations:

In the **Clinical Significance** column, we can see that we can filter. Let's select those variants that have **pathogenic** significance. Clicking into the search box underneath this column, we can select **pathogenic**:

How many variants are pathogenic?
The last thing we might want to do is to export our results. We can use the export button at the bottom of the table:

When you click that, you will have the option to export the variant level results as a tab seperated value (TSV) file. Note that this result table will have filters applied to it as well.
:::{.callout-note}
## Multiple Rules
Note that we could have limited our search to pathogenic variants by adding another filter rule like we did above in the filtering step. We're showing this way in case you didn't know the available categories within the `Clinical Significance` column.
:::
## Deactivating Your Environment
When you're done using OpenCRAVAT and it's installed by an environment, make sure to deactivate:
For the `python/venv` installation:
```bash
deactivate
```
For the `mamba/conda` installation:
```bash
mamba deactivate
```
## What You Learned
We learned the following in this section:
- **Prepare** your system for installing OpenCRAVAT
- **Install** the OpenCRAVAT software locally using `pip`
- **Search** for and **Install** available annotators
- **Explain** compatible variant formats
- **Annotate** variant files using the CLI tools
- **Visualize** and **Summarize** Results in OpenCRAVAT