Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: re-word and simplify, add quick starts where applicable #279

Merged
merged 3 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ Have a look at existing [issues] to see if an issue has already been discussed.

## Pull Requests

We welcome you to open up a pull request
to suggest a change, even if it's a small one line change. If the change is large, it is a good idea to first open an
issue to discuss the change in order gain feedback and guidance.
We welcome you to open up a pull request to suggest a change, even if it's a small one line change.
If the change is large, it is a good idea to first open an issue to discuss the change in order gain feedback and
guidance.

### Tests and formatting

Expand Down
76 changes: 33 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,35 @@
A **server** implementation of the [htsget protocol][htsget-protocol] for bioinformatics in Rust. It is:
* **Fully-featured**: supports BAM and CRAM for reads, and VCF and BCF for variants, as well as other aspects of the protocol such as TLS, and CORS.
* **Serverless**: supports local server instances using [Axum][axum] and [Actix Web][actix-web], and serverless instances using [AWS Lambda Rust Runtime][aws-lambda-rust-runtime].
* **Storage interchangeable**: supports local filesystem storage as well as objects via [Minio][minio] and AWS S3.
* **Storage interchangeable**: supports local filesystem storage as well as objects via [Minio][minio] and [AWS S3][aws-s3].
* **Thoroughly tested and benchmarked**: tested using a purpose-built [test suite][htsget-test] and benchmarked using [criterion-rs].

To get started, see [Usage].

**Note**: htsget-rs is still experimental, and subject to change.

[actix-web]: https://github.com/actix/actix-web
[criterion-rs]: https://github.com/bheisler/criterion.rs
[Usage]: #usage

## Overview
## Quick start
mmalenic marked this conversation as resolved.
Show resolved Hide resolved

To run a local instance htsget-rs, run [htsget-axum]:

```sh
cargo run -p htsget-axum
```

And fetch tickets from `127.0.0.1:8080`, which serves data from [data]:

```sh
curl 'http://127.0.0.1:8080/variants/data/vcf/sample1-bcbio-cancer'
```

### Configuration

Htsget-rs is configured using environment variables or config files, see [htsget-config] for details.

### Cloud

Cloud-based htsget-rs uses [htsget-lambda]. For an example deployment of this crate see [deploy].

## Protocol

Htsget-rs implements the [htsget protocol][htsget-protocol], which is an HTTP-based protocol for querying bioinformatics files.
The htsget protocol outlines how a htsget server should behave, and it is an effective way to fetch regions of large bioinformatics files.
Expand Down Expand Up @@ -51,22 +68,6 @@ htsget-rs implements the following components of the protocol:
[htsget-diagram-png]: https://samtools.github.io/hts-specs/pub/htsget-ticket.png
[tokio]: https://github.com/tokio-rs/tokio

## Usage

Htsget-rs is configured using environment variables, for details on how to set them, see [htsget-config].

### Local
To run a local instance htsget-rs, run [htsget-axum] by executing the following:
```sh
cargo run -p htsget-axum
```
Using the default configuration, this will start a ticket server on `127.0.0.1:8080` and a data block server on `127.0.0.1:8081`
with data accessible from the [data] directory. See [htsget-axum] for more information.

### Cloud
Cloud based htsget-rs uses [htsget-lambda]. For more information and an example deployment of this crate see
[deploy].

### Tests

Tests can be run tests by executing:
Expand All @@ -77,53 +78,41 @@ cargo test --all-features

To run benchmarks, see the benchmark sections of [htsget-actix][htsget-actix-benches] and [htsget-search][htsget-search-benches].

[htsget-actix-benches]: htsget-actix/README.md#Benchmarks
[htsget-search-benches]: htsget-search/README.md#Benchmarks
[htsget-actix-benches]: htsget-actix/README.md#benchmarks
[htsget-search-benches]: htsget-search/README.md#benchmarks

## Project Layout

This repository consists of a workspace composed of the following crates:
This repository is a workspace of crates:

- [htsget-config]: Configuration of the server.
- [htsget-actix]: Local instance of the htsget server. Contains framework dependent code using [Actix Web][actix-web].
- [htsget-axum]: Local instance of the htsget server. Contains framework dependent code using [Axum][axum].
- [htsget-http]: Handling of htsget HTTP requests. Framework independent code.
- [htsget-lambda]: Cloud based instance of the htsget server. Contains framework dependent
- [htsget-lambda]: Cloud-based instance of the htsget server. Contains framework dependent
code using the [Rust Runtime for AWS Lambda][aws-lambda-rust-runtime].
- [htsget-search]: Core logic needed to search bioinformatics files based on htsget queries.
- [htsget-storage]: Storage interfaces for local and cloud-based files.
- [htsget-test]: Test suite used by other crates in the project.

Other directories contain further applications or data:
- [data]: Contains example data files which can be used by htsget-rs, in folders denoting the file type.
This directory also contains example events used by a cloud instance of htsget-rs in the [`events`][data-events] subdirectory.
- [deploy]: An example deployment of [htsget-lambda].

In htsget-rs the ticket server handled by [htsget-axum], [htsget-actix] or [htsget-lambda], and the data
block server is handled by the [storage backend][storage-backend], either [locally][local-storage], or using [AWS S3][s3-storage].
This project layout is structured to allow for extensibility and modularity. For example, a new ticket server and data server could
be implemented using Cloudflare Workers in a `htsget-http-workers` crate and Cloudflare R2 in [htsget-search].

See the [htsget-search overview][htsget-search-overview] for more information on the storage backend.
- [data]: Contains example data files used by htsget-rs and in tests.
- [deploy]: Deployments for htsget-rs.

[axum]: https://github.com/tokio-rs/axum
[htsget-config]: htsget-config
[htsget-actix]: htsget-actix
[htsget-http]: htsget-http
[htsget-lambda]: htsget-lambda
[htsget-search]: htsget-search
[htsget-search-overview]: htsget-search/README.md#Overview
[htsget-storage]: htsget-storage
[htsget-test]: htsget-test

[storage-backend]: htsget-search/src/storage
[local-storage]: htsget-search/src/storage/local.rs
[s3-storage]: htsget-search/src/storage/s3.rs

[data]: data
[deploy]: deploy

[actix-web]: https://actix.rs/
[aws-lambda-rust-runtime]: https://github.com/awslabs/aws-lambda-rust-runtime
[data-events]: data/events

## Contributing

Expand All @@ -140,4 +129,5 @@ This project is licensed under the [MIT license][license].
[htsget-lambda]: htsget-lambda
[license]: LICENSE
[aws-lambda-rust-runtime]: https://github.com/awslabs/aws-lambda-rust-runtime
[aws-s3]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html
[minio]: https://min.io/
2 changes: 1 addition & 1 deletion SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

## Reporting a Vulnerability

Please report vulnerabilities by opening an issue or sending an email to [email protected]
Please report vulnerabilities by opening an issue or sending an email to [email protected].
31 changes: 19 additions & 12 deletions htsget-actix/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,21 @@ This crate is used for running a local instance of htsget-rs. It is based on:

[htsget-http]: ../htsget-http

## Usage
## Quick start

This application has the same functionality as [htsget-axum]. To use it, following the [htsget-axum][htsget-axum] instructions, and
replace any calls to `htsget-axum` with `htsget-actix`.
Launch a server instance:

It is recommended to use [htsget-axum] because it better fits with the rest of [htsget-rs]. For example [htsget-actix]
uses the actix-web framework for the ticket server, however it depends on [htsget-axum] for the data server. Also, components
in [htsget-lambda] use Axum dependencies.
```sh
cargo run -p htsget-actix
```

And fetch tickets from `localhost:8080`:

[htsget-lambda]: ../htsget-lambda
```sh
curl 'http://localhost:8080/variants/data/vcf/sample1-bcbio-cancer'
```

This crate uses [htsget-config] for configuration. All options supported in [htsget-axum] are also supported here.

### As a library

Expand All @@ -53,12 +58,13 @@ This crate has the following features:
* `experimental`: used to enable experimental features that aren't necessarily part of the htsget spec, such as Crypt4GH support through `C4GHStorage`.

## Benchmarks

Benchmarks for this crate written using [Criterion.rs][criterion-rs], and aim to compare the performance of this crate with the
[htsget Reference Server][htsget-refserver].
There are a set of light benchmarks, and one heavy benchmark. Light benchmarks can be performed by executing:
There are a set of light benchmarks, and one heavy benchmark. For light benchmarks run:

```
cargo bench -p htsget-axum -- LIGHT
cargo bench -p htsget-actix -- LIGHT
```

To run the heavy benchmark, an additional vcf file needs to be downloaded, and placed in the [`data/vcf`][data-vcf] directory:
Expand All @@ -67,16 +73,17 @@ To run the heavy benchmark, an additional vcf file needs to be downloaded, and p
curl ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr14.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz > data/vcf/internationalgenomesample.vcf.gz
```

Then to run the heavy benchmark:
Then run the heavy benchmark:

```
cargo bench -p htsget-axum -- HEAVY
cargo bench -p htsget-actix -- HEAVY
```

[criterion-rs]: https://github.com/bheisler/criterion.rs
[htsget-refserver]: https://github.com/ga4gh/htsget-refserver
[data-vcf]: ../data/vcf
[htsget-axum]: ../htsget-axum/README.md#usage
[htsget-axum]: ../htsget-axum/README.md
[htsget-config]: ../htsget-config/README.md

## License

Expand Down
49 changes: 29 additions & 20 deletions htsget-axum/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,23 +21,32 @@ This crate is used for running a server instance of htsget-rs. It is based on:

[htsget-http]: ../htsget-http

## Usage
## Quick start

### For running htsget-rs as an application
Launch a server instance:

This crate uses [htsget-config] for configuration. See [htsget-config] for details on how to configure this crate.

To run an instance of this crate, execute the following command:
```sh
cargo run -p htsget-axum
```

And fetch tickets from `localhost:8080`:

```sh
curl 'http://localhost:8080/variants/data/vcf/sample1-bcbio-cancer'
```

This crate uses [htsget-config] for configuration.

### Storage backends

Using the default configuration, this will start a ticket server on `127.0.0.1:8080` and a data block server on `127.0.0.1:8081`
with data accessible from the [`data`][data] directory. This application supports storage backends defined in [htsget-storage].

To use `S3Storage`, compile with the `s3-storage` feature:
```sh
cargo run -p htsget-axum --features s3-storage
```

This will start a ticket server with `S3Storage` using a bucket called `"data"`.

To use `UrlStorage`, compile with the `url-storage` feature.
Expand All @@ -51,19 +60,18 @@ See [htsget-search] for details on how to structure files.

#### Using TLS

There two server instances that are launched when running this crate. The ticket server, which returns a list of ticket URLs that a client must fetch.
And the data block server, which responds to the URLs in the tickets. By default, the data block server runs without TLS.
To run the data block server with TLS, pem formatted X.509 certificates are required.
By default, htsget-rs runs without TLS. To use TLS, pem formatted X.509 certificates are required.

For development and testing purposes, self-signed certificates can be used.
For example, to generate self-signed certificates run:
For development and testing purposes, self-signed certificates can be used. For example, to generate self-signed certificates run:

```sh
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -nodes -subj '/CN=localhost'
```

It is not recommended to use self-signed certificates in a production environment
as this is considered insecure.
It is not recommended to use self-signed certificates in a production environment as this is considered insecure.

There two server instances that are launched when running this crate, the ticket server and data block server. TLS
is specified separately for both servers.

#### Example requests

Expand All @@ -73,39 +81,39 @@ Some example requests using `curl` are shown below:
* GET

```sh
curl '127.0.0.1:8080/variants/data/vcf/sample1-bcbio-cancer'
curl 'http://localhost:8080/variants/data/vcf/sample1-bcbio-cancer'
```

* POST

```sh
curl --header "Content-Type: application/json" -d '{}' '127.0.0.1:8080/variants/data/vcf/sample1-bcbio-cancer'
curl --header "Content-Type: application/json" -d '{}' 'http://localhost:8080/variants/data/vcf/sample1-bcbio-cancer'
```

* Parametrised GET

```sh
curl '127.0.0.1:8080/variants/data/vcf/sample1-bcbio-cancer?format=VCF&class=header'
curl 'http://localhost:8080/variants/data/vcf/sample1-bcbio-cancer?format=VCF&class=header'
```

* Parametrised POST

```sh
curl --header "Content-Type: application/json" -d '{"format": "VCF", "regions": [{"referenceName": "chrM"}]}' '127.0.0.1:8080/variants/data/vcf/sample1-bcbio-cancer'
curl --header "Content-Type: application/json" -d '{"format": "VCF", "regions": [{"referenceName": "chrM"}]}' 'http://localhost:8080/variants/data/vcf/sample1-bcbio-cancer'
```

* Service info

```sh
curl '127.0.0.1:8080/variants/service-info'
curl 'http://localhost:8080/variants/service-info'
```

### Crypt4GH

The htsget-rs server experimentally supports serving [Crypt4GH][c4gh] encrypted files to clients. See the [Crypt4GH section][config-c4gh] in the configuration
for more details on how to configure this.
The htsget-rs server experimentally supports serving [Crypt4GH][c4gh] encrypted files to clients. See the [Crypt4GH section][config-c4gh]
in the configuration for more details on how to configure this.

Run the server with the following to enable Crypt4GH support using the [example config][example-config]:
To use Crypt4GH run the server using the [example config][example-config] and the `experimental` flag:

```sh
cargo run -p htsget-axum --features experimental -- --config htsget-config/examples/config-files/c4gh.toml
Expand All @@ -119,6 +127,7 @@ curl 'http://localhost:8080/reads/data/c4gh/htsnexus_test_NA12878?referenceName=

The output consists of the Crypt4GH header, which includes the original header, the edit lists, and the re-encrypted header that
the recipient can use to decrypt bytes:

```json
{
"htsget": {
Expand Down
Loading
Loading