The EVE_annotation scripts is developed to identify Endogenous Viral Elements (EVE) within genomic sequences. It takes advantage of several established bioinformatics tools and custom scripts to accurately annotate and analyze potential EVEs.
Clone this repository to your local machine and provide necessary permissions to the scripts using the following commands:
git clone https://github.com/carolebelliardo/EVE_annotation.git
chmod +x EVE_annotation
Then, download
- Viral proteins on the NCBI using web page downloading service <path_to_db1>
- The complet NR database <path_to_db2> using the following command:
wget -Nc -o wgetNRfasta.log 'ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz'
gzip -d nr.gz
chmod 775 nr
The EVE Detection Tool relies on the following external tools and libraries:
- Diamond: A sequence aligner for protein and translated DNA searches 1.
- Bedtools: A powerful toolset for genome arithmetic2.
- R: A programming language for statistical computing and graphics3.
- Python 3: A programming language required for executing the provided Python scripts4.
The following R librairies are required:
library(data.table)
library(taxonomizr)
Ensure that these softwares are installed and accessible in your system's PATH before running the EVE annotation scripts.
cd EVE_annotation
./EVE_annotation.sh <path_to_db1> <path_to_db2> <path_to_host_fasta>
Replace <path_to_db1>, <path_to_db2>, and <path_to_host_fasta> with the paths to the required database and input files.
The EVE annotation directory consists of the following scripts:
* EVE_annotation.sh: The main script that coordinates the entire EVE annotation process.
* bestHitsToFasta.py: Python script to parse BLAST results for identifying best hits and returning a FASTA file.
* absolutPosi.R: R script for absolute coordinate calculations.
* Get_EVE_annotation_summary.r: R script to generate a summary of EVE lineages.
* addFamily.py: Python script to add family information to EVE annotations.
- Carole Belliardo: [email protected]
- Clément Gilbert: [email protected]
We acknowledge the people involved in generating the genome assembly data for their publications and for making the data publicly available. We also would like to acknowledge members of the ITN Insect Doctors consortium and the current members of IRBI who provided some feedback throughout the study.
Contributions to the EVE Detection Tool are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request in the GitHub repository.
This project is licensed under the MIT License.
Footnotes
-
Buchfink, B., Xie, C., & Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature Methods, 12(1), 59-60. DOI: 10.1038/nmeth.3176 ↩
-
Quinlan, A. R., & Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841-842. DOI: 10.1093/bioinformatics/btq033 ↩
-
R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/ ↩
-
Python Software Foundation. (2021). Python Language Reference, version 3.9.6. URL: https://www.python.org/ ↩