This project serves the purpose of creating a SQL database to store a VCF-like file containing variants data.
Then, the database is accessed through an interface which has a single input bar that must handle either RSID
and CHROMOSOME POSITION
queries.
This is all done in less than ~500ms for 95% of the queries using PostgreSQL and FASTAPI.
You will require Docker and docker-compose to run the app.
Also, you will need Python and Make installed in your machine due to the "database population" script.
1. Put the desired `{}.vcf.gz` inside `database/data`
2. make run file=database/data/sample.vcf.gz
# or use full dataset: database/data/hg37variants1000g.vcf.gz
make run
will install the requirements, start containers for database, frontend and backend, then it will populate the database.
Access the frontend at: http://127.0.0.1:4243 and the swagger UI at: http://127.0.0.1:4242/docs.
If you used the default .vcf.gz
provided, you can test the frontend with: 1 10177
or rs540431307
(different variant)
Run:
zcat database/data/hg37variants1000g.vcf.gz | head -n {NEW_SIZE} | bgzip > database/data/sample.vcf.gz
At first, the database of choice was tiledb-vcf. However, after facing several incompatibilities between the VCF and TileDB, PostgreSQL partitioned by chromosome seemed scalable and easy to deploy.
FastAPI was chosen due to its increased speed when compared with Flask and web2py frameworks. As a plus, FastAPI allows for easy in-code documentation with pydantic models and schemas. Moreover, FastAPI offers several layers of compatibility with PostgreSQL through SQLAlchemy and Pydantic.
It is simple to add OAuth2 with FastAPI. However, it was not added to this project for simplicity.