Skip to content

Subcommand: simulate

Lucas Czech edited this page Sep 16, 2024 · 20 revisions

Synopsis: Create a file with simulated random frequency data.

Usage: grenedalf simulate [options]

Required options:

  • --read-depths
  • --length

Documentation for grenedalf v0.6.2

Table of contents:

Options

Settings

--format
TEXT:{pileup,sync}=pileup
Select the output file format, either (m)pileup, or PoPoolation2 sync.
--random-seed
UINT=0
Set the random seed for generating values, which allows reproducible results. If not provided, the system clock is used to obtain a random seed.

Samples

--read-depths
TEXT
Required. Read depths of the samples to simulate, as a comma- or tab-separated list. The read depth of each sample is used at the total count per position to randomly distribute across nucleotides. Per sample, the list can either contain a single number, which will be used as the read depth for that sample at each position, or it can be two numbers separated by a slash, which will be used as min/max to generate random read depth at each position. The length of this list is also used to determine the number of samples to simulate.

Genome

--chromosome
TEXT=A
Name of the chromosome. This is simply used as the first column in the output file. At the moment, only one chromosome is supported.
--mutation-rate
FLOAT:(FLOAT in [0 - 1]) AND (POSITIVE)=1e-08 Excludes: --mutation-count
Mutation rate to simulate. This rate times the --length is used as the number of mutations to generate in total (which can alternatively be directly provided via --mutation-count).
--mutation-count
UINT=0 Excludes: --mutation-rate
Number of mutations to simulate in total across the chromosome, spread across the --length.
--length
UINT=0
Required. Total length of the chromosome to simulate. Mutations are spread across this length.
--omit-invariant-positions
FLAG
If set, only write the mutated positions in the output file. Note that these are not standard (m)pileup or sync files any more; still this option might be useful.

Pileup

--with-quality-scores
FLAG
If set, phred-scaled quality scores are written when simulating an (m)pileup file, using the --min-phred-score and --max-phred-score settings. Ignored otherwise.
--min-phred-score
UINT:UINT in [0 - 90]=10
Minimum phred score to use when simulating an (m)pileup file. Ignored otherwise.
--max-phred-score
UINT:UINT in [0 - 90]=40
Maximum phred score to use when simulating an (m)pileup file. Ignored otherwise.

Output

--out-dir
TEXT=.
Directory to write files to
--file-prefix
TEXT
File prefix for output files. Most grenedalf commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data.
--file-suffix
TEXT
File suffix for output files. Most grenedalf commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data.
--compress
FLAG
If set, compress the output files using gzip. Output file extensions are automatically extended by .gz.

Global Options

--allow-file-overwriting
FLAG
Allow to overwrite existing output files instead of aborting the command. By default, we abort if any output file already exists, to avoid overwriting by mistake.
--verbose
FLAG
Produce more verbose output.
--threads
UINT
Number of threads to use for calculations. If not set, we guess a reasonable number of threads, by looking at the environmental variables (1) OMP_NUM_THREADS (OpenMP) and (2) SLURM_CPUS_PER_TASK (slurm), as well as (3) the hardware concurrency (number of CPU cores), taking hyperthreads into account, in the given order of precedence.
--log-file
TEXT
Write all output to a log file, in addition to standard output to the terminal.

Citation

When using this method, please do not forget to cite

Lucas Czech, Jeffrey Spence, Moises Exposito-Alonso. grenedalf: population genetic statistics for the next generation of pool sequencing. Bioinformatics, vol. 40, no. 8, 2024. doi:10.1093/bioinformatics/btae508