Skip to content

Subcommand: cathedral plot

Lucas Czech edited this page Jul 16, 2024 · 11 revisions

Synopsis: Create a cathedral plot, using the pre-computated cathedral data.

Usage: grenedalf cathedral-plot [options]

Documentation for grenedalf v0.5.2

Description

Create cathedral plots. This is the second step after fst-cathedral, and turns the matrix computed there into the actual plots. We split this into two commands for efficiency, so that it is faster to iterate over different color schemes and other plotting settings.

The command takes either the csv or the json files produced by fst-cathedral as input (and infers the respective other file, both have to be in the same directory, with the same base name). It then colors the value matrix according to the provided color map settings, and stores the result as a bmp bitmap picture. Furthermore, it creates an svg file that additionally contains axes, a legend for the color map, and a title, and can be edited and refined later with any vector graphics program.

FST Cathedral Plot.

See fst-cathedral for details on this plot.

Note that the pool-sequencing corrected estimator of FST applies a correction term that can yield FST values below zero. This is expected, and a consequence of correcting for the statistical noise. For details, see our equations document. In order to not have these artifacts influence the plot, and to create consistency in the plots, we recommend to clip negative values to zero, by providing --min-value 0 --clip-under. The first option limits the scale to non-negative values, and the second option makes sure that the negative values are clipped to be 0, instead of being highlighted in the --under-color.

Similarly, it might be beneficial to use --max-value X --clip-over with some reasonable maximum value X, if multiple plots are created that need to be compared to each other. That way, all plots will have the same scale, and hence have comparable color values.

Lastly, at the moment, we only have implemented cathedral plots for FST. They are however also possible for any other window-based statistic, such as the diversity metrics. If this is something that you are interested in, please open an issue to tell us.

Colors

For the options of this command, the single colors and the main gradient can be specified as follows.

Single Colors

Single colors can be specified

  • by name, as one of the 140 web colors, that is, the basic 16 html color names and the extended 124 X11 color names. This is case-independent and insensitive to white spaces.
  • by name, as one of the 954 xckd colors, again case- and white-space-insensitive.
  • by hex code in the format #RRGGBB or #RRGGBBAA (with alpha, which might be useful when producing svg files), using hexadecimal coding for each of the red, green, and blue values, case insensitive. For example, use #000000 for black and #ffffff for white. Note that # also happens to denote the start of a comment in command lines; hence, you probably need to put this in quotation marks.

A typical color specification might hence look like this: --under-color "#ff00ff" or --mask-color orange.

Lists and Gradients of Colors

Gradients and lists of colors can be specified as

  • a comma-separated list of colors following the above specifications for single colors (this list can either be provided in a file with one color per line, or directly as a string on the command line), or

  • as one of the following named color lists/gradients:

    Color lists in grenedalf.

Depending on context, not all of these lists might be well suited; it does for example not make much sense to use a (categorical) qualitative color list as a (continuous) gradient.

When specifying individual colors to build a custom gradient, the specified colors are evenly spaced out across the range of values, and then linearly interpolated to create the gradient. For example, a gradient from black to red to yellow could be specified as --color-list "#000000,#ff0000,#ffff00".

Our internal interpolation between colors to create a gradient (currently) is done linearly in RGB color space - this does not always yield the best looking results. We hence recommend to construct a gradient with several (5 or more) intermediate colors using external tools that operate in LCH space (e.g., this gradient generator), and then use these intermediate colors as input. This way, we only need to interpolate between nearby similar colors in RGB, which works/looks better than RGB interpolation between vastly different colors.

Options

Input

--json-path
TEXT:PATH(existing)=[] ... Excludes: --csv-path
List of json files or directories to process. For directories, only files with the extension .json are processed. To input more than one file or directory, either separate them with spaces, or provide this option multiple times.
--csv-path
TEXT:PATH(existing)=[] ... Excludes: --json-path
List of csv files or directories to process. For directories, only files with the extension .csv are processed. To input more than one file or directory, either separate them with spaces, or provide this option multiple times.

Color

--color-list
TEXT=inferno
List of colors to use for the palette. Can either be the name of a color list, a file containing one color per line, or an actual comma-separated list of colors. Colors can be specified in the format #rrggbb using hex values, or by web color names.
--reverse-color-list
FLAG
If set, the order of colors of the --color-list is reversed.
--under-color
TEXT=#00ffff
Color used to indicate values below the min value. Color can be specified in the format #rrggbb using hex values, or by web color names.
--clip-under
FLAG
Clip (i.e., clamp) values less than min to be inside [ min, max ], by setting values that are too low to the specified min value. If set, --under-color is not used to indicate values out of range.
--over-color
TEXT=#ff00ff
Color used to indicate values above the max value. Color can be specified in the format #rrggbb using hex values, or by web color names.
--clip-over
FLAG
Clip (i.e., clamp) values greater than max to be inside [ min, max ], by setting values that are too high to the specified max value. If set, --over-color is not used to indicate values out of range.
--clip
FLAG
Clip (i.e., clamp) values to be inside [ min, max ], by setting values outside of that interval to the nearest boundary of it. This option is a shortcut to set --clip-under and --clip-over at once.
--color-normalization
TEXT:{linear,logarithmic}=linear
To create the cathedral plot, the value of each pixel needs to be translated into a color, by mapping from the range of values into the range of the color map. This translation can be done as a simple linear transform, or logarithmic, so that low values can be distinguished with more detail.
--min-value
FLOAT=nan
As an alternative to determining the range of values automatically, the range limits can be set explicitly. This allows for instance to cap the visualization in cases of outliers that would otherwise hide detail in the lower values. Any value that is below the min specified here will then be mapped to the under color, or clipped to the lowest value in the color map.
--max-value
FLOAT=nan
See --min-value; this is the equivalent upper limit of values.Any value that is above the max specified here will then be mapped to the over color, or be clipped to the highest value in the color map.

Output

--out-dir
TEXT=.
Directory to write files to
--file-prefix
TEXT
File prefix for output files. Most grenedalf commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data.
--file-suffix
TEXT
File suffix for output files. Most grenedalf commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data.

Global Options

--allow-file-overwriting
FLAG
Allow to overwrite existing output files instead of aborting the command.
--verbose
FLAG
Produce more verbose output.
--threads
UINT
Number of threads to use for calculations. If not set, we guess a reasonable number of threads, by looking at the environmental variables (1) OMP_NUM_THREADS (OpenMP) and (2) SLURM_CPUS_PER_TASK (slurm), as well as (3) the hardware concurrency, taking hyperthreads into account, in the given order of precedence.
--log-file
TEXT
Write all output to a log file, in addition to standard output to the terminal.

Citation

When using this method, please do not forget to cite

Lucas Czech, Jeffrey Spence, Moises Exposito-Alonso. grenedalf: population genetic statistics for the next generation of pool sequencing. arXiv, 2023. doi:10.48550/arXiv.2306.11622

Clone this wiki locally