-
Notifications
You must be signed in to change notification settings - Fork 0
Types of input data
Different types of files can be used in supeRbaits:
Argument name = database
This input file refers to the genomic information available from the species of interest that you want to use as a reference for the designed baits. This reference database should be in FASTA format. See more information about the FASTA format here. For each entry, the FASTA format consists of at least two lines: one introduced by '>' and followed by a string (with the name of that chromosome, contig or piece of sequence of DNA), and the following lines containing the genomic sequence ('ATTTCAGGGTATGG'). Hence forth, each individual entity in a database file is called a 'sequence'.
Note that we have implemented a function within the package, i.e. standardize_lengths, that makes sure that the sequences are properly organised in the FASTA file.
Argument name = exclusions
This type of input file is only used if you want to exclude certain areas from your genomic database and not generate baits from those (using the argument exclusions). The input file consists on the first three columns of a BED file, where the first column represents the chromosome/contig name (same names used in the database), and the second and third column represent the bp where the exclusion region starts and end. Each row contains a separate exclusion region. This file does not require column headers, and the data should be separated by tabs.
Argument name = regions
This type of input file refers to regions of the genomic database that you are very interested in including within your baits. This type of input file is used if you want to make use of the argument regions. A region file is structured in a similar fashion to the exclusions, where for each gene you have one or more intervals of base pairs you are interested in. The input file consists on the first three columns of a BED file, i.e. Chromosome_name \t start_bp \t end_bp\n, where each row contains a single region of interest.
Argument name = restrict
This type of input file is a vector of chromosome names OR position numbers to which the analysis should be restricted to. This argument allows supeRbaits to only design baits for specific genes, specified either by name or position on the database.
For questions regarding supeRbaits's use or development, contact us through GitHub. If you would like to cite supeRbaits, please refer to its main publication (for the moment it is in pre-print here).
For questions regarding supeRbaits's use or development, contact us through GitHub. If you would like to cite supeRbaits, please refer to its main publication (for the moment it is in pre-print here).