You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A heatmap typically maps continuous data to one of two types of color palettes, depending on the distribution of the data to be visualized:
a "sequential" palette, dark to light or mute to saturated, for data
from a one-sided distributions (e.g. count data)
a "diverging" palette with contrasting colors at the upper and lower
extremes and a mute color in the center for data from two-sided
distributions (e.g. Gaussian)
Implementation Overview
My current implementation tries to balance:
minimizing invasive changes to existing IGV code
minimizing impact on performance
maximizing generality of the new capability (add capability, not policy)
minimizing new configuration requirements on the user
Implementation rationale
Because a heatmap can be thought of as nothing more than multiple rows of densely-packed annotation features (typically without any additional decoration like labels), the FeatureTrack that supports general annotations already has all facilities required to support heatmaps. In particular,
All the behavior concerning feature visibility is used as-is.
Its ability to render multiple rows within one track is used.
Only some small modifications to its behavior were necessary:
Input data, not packFeatures, assigns Features to rows.
No decoration of heatmap cells is supported.
Data is mapped at runtime to colors used to fill heatmap cells (without borders).
A default mapping function and several palettes are provided, but all can be overridden in config.
These changes have almost no impact on performance, and what little performance impact there is could be mitigated with slightly more invasive changes.
To keep the IGV implementation as simple (and fast) as possible, the data is expected to be fully preprocessed for display; all IGV does at runtime is map numeric values to colors using a colormap function and a palette.
Concretely, data should be in [0,1]. Values below and above this range are by default clamped to 0 and 1 respectively and thus mapped to the palette's edge colors. Also, two discrete "outlier" colors can optionally be provided in the track config to highlight outlying data instead of just using the palette's "edges" (a very good idea I first saw in matplotlib).
Data is delivered in BED files
Given the preceding characterization of heatmaps, it is natural to deliver heatmap data as BED files with a very minor abuse of the format: the 4th (name) column contains a 0-based row assignment. The *name field in BED files can be thought of as naming the scale of the data (corresponding to a row). Since genome coordinate ranges in heatmap data would not typically be associated with other names, this is not such an abuse of the BED format. The 5th (score) column is used for it's intended purpose: a score.
This arrangement also allows additional runtime optimizations:
the mute color corresponding to the most "uninteresting" data range
is provided by the config.altColor. Thus, no cells that would be mapped
to this value need to be included in the input data! The data is thereby
minified.
Obviously, the range of values mapped to this color can be adjusted
at data preparation time to effectively compress the data with minimal
loss of information (similar what is done in preparation of a JPG image).
Similarly, adjacent cells that are not "too" different could be
optionally merged during data preparation to further reduce size.
These data preparation optimizations are, or course, optional but advisable in the interest of performance.
New files
Only one new JavaScript file, multiscalehm.js, is added providing:
a default, linear colormap function and
a small selection of 64-color palettes (some adapted from Matplotlib and
others generated by the colorspace package from R)
a renderCell function called by FeatureTrack.draw.
Only the renderCell function is necessary. The colormap function and palettes could be made the user's responsibility to be defined in the config, but as a suitable palette and colormap function is always necessary and a linear map is most common, providing these as defaults reduces work for user. Importantly both can still be defined entirely in the config, maximizing generality.
IGV code changes
With the above considerations only a few edits to IGV were necessary:
A new config type is added: "multiscaleheat"
Exports from multiscalehm.js are exposed in index.js to make the default
colormap and palettes available to the user in their config.
A one-line change to trackFactory.js mapping "multiscaleheat" to "feature".
A conditional in TextFeatureSource.loadFeatures that parses the 4th
column of a BED file as an integer and assigns the resulting number to
the (already-existing) row attribute of Feature (pre-empting the call to packFeatures).
Only a few changes in FeatureTrack:
setting renderCell as the FeatureTrack.render method
setting background color
preclude the code that "Ensure[s] a visible gap between features"
User requirements
The following should be set in the track config:
maxRows - the number of rows in the heatmap
height - the pixel height of the heatmap track. This is used verbatim.
In particular, displayMode and its related variables are not used by
heatmap tracks.
color - a function taking a Feature as it's sole argument.
altColor - used as the "background" and support sparser input data. This
color should typically correspond to the most "uninteresting" color in the
heatmap's palette.
Defaults are provided for everything that insure something is displayed, though it will certainly not be ideal without user configuration, and it won't even be correct if maxRows is unset.
As is, the implementation simply make full use of configured space, so heatmap lines are config.height / config.maxRows pixels. In particular, squishedRowHeight and related config variables are ignored, and no runtime adjustment of track height should occur.
The maxRows config element could be made optional since largest row index can be inferred at runtime, but requiring specification of maxRows simplifies the implementation (being known before data is parsed). May also want to use scaleCount as a more meaningful alias.
Input must come from BED files with:
0-based row number as the first of possibly multiple semi-colon-delimited subfields of the 4th column, and
score in [0,1] in the 5th column
The 6th (strand) column is ignored
As with my previous stacked bar graph, I'll submit a pull request if this is of interest to the group.
Thanks,
roger kramer, bioinformatician
University of Eastern Finland
The text was updated successfully, but these errors were encountered:
The comment on #1594 would apply here as well. Overall there are too many changes to igv.js here to accommodate a track and file convention without a user community. Again perhaps this illustrates the need for a "contrib" plugin capability. In this case you would need to supply the track and a parser as you are in effect creating a new file format. So its perhaps more difficult than #1594 .
One meta comment, igv.js already has a heatmap track and format, "seg", for segmented copy number. Its possible this track and format ("seg" is a widley used standard format for copy number) would make a better basis than a bed track, with less special cases.
A proposal for a multi-scale Heatmap Track
Motivation
This track type was motivated by multiscale genomic analyses such as https://pubmed.ncbi.nlm.nih.gov/24727652/
A heatmap typically maps continuous data to one of two types of color palettes, depending on the distribution of the data to be visualized:
from a one-sided distributions (e.g. count data)
extremes and a mute color in the center for data from two-sided
distributions (e.g. Gaussian)
Implementation Overview
My current implementation tries to balance:
Implementation rationale
Because a heatmap can be thought of as nothing more than multiple rows of densely-packed annotation features (typically without any additional decoration like labels), the FeatureTrack that supports general annotations already has all facilities required to support heatmaps. In particular,
These changes have almost no impact on performance, and what little performance impact there is could be mitigated with slightly more invasive changes.
To keep the IGV implementation as simple (and fast) as possible, the data is expected to be fully preprocessed for display; all IGV does at runtime is map numeric values to colors using a colormap function and a palette.
Concretely, data should be in [0,1]. Values below and above this range are by default clamped to 0 and 1 respectively and thus mapped to the palette's edge colors. Also, two discrete "outlier" colors can optionally be provided in the track config to highlight outlying data instead of just using the palette's "edges" (a very good idea I first saw in matplotlib).
Data is delivered in BED files
Given the preceding characterization of heatmaps, it is natural to deliver heatmap data as BED files with a very minor abuse of the format: the 4th (name) column contains a 0-based row assignment. The *name field in BED files can be thought of as naming the scale of the data (corresponding to a row). Since genome coordinate ranges in heatmap data would not typically be associated with other names, this is not such an abuse of the BED format. The 5th (score) column is used for it's intended purpose: a score.
This arrangement also allows additional runtime optimizations:
is provided by the config.altColor. Thus, no cells that would be mapped
to this value need to be included in the input data! The data is thereby
minified.
at data preparation time to effectively compress the data with minimal
loss of information (similar what is done in preparation of a JPG image).
optionally merged during data preparation to further reduce size.
These data preparation optimizations are, or course, optional but advisable in the interest of performance.
New files
Only one new JavaScript file, multiscalehm.js, is added providing:
others generated by the colorspace package from R)
Only the renderCell function is necessary. The colormap function and palettes could be made the user's responsibility to be defined in the config, but as a suitable palette and colormap function is always necessary and a linear map is most common, providing these as defaults reduces work for user. Importantly both can still be defined entirely in the config, maximizing generality.
IGV code changes
With the above considerations only a few edits to IGV were necessary:
colormap and palettes available to the user in their config.
column of a BED file as an integer and assigns the resulting number to
the (already-existing) row attribute of Feature (pre-empting the call to packFeatures).
User requirements
The following should be set in the track config:
In particular, displayMode and its related variables are not used by
heatmap tracks.
color should typically correspond to the most "uninteresting" color in the
heatmap's palette.
Defaults are provided for everything that insure something is displayed, though it will certainly not be ideal without user configuration, and it won't even be correct if maxRows is unset.
As is, the implementation simply make full use of configured space, so heatmap lines are config.height / config.maxRows pixels. In particular, squishedRowHeight and related config variables are ignored, and no runtime adjustment of track height should occur.
The maxRows config element could be made optional since largest row index can be inferred at runtime, but requiring specification of maxRows simplifies the implementation (being known before data is parsed). May also want to use scaleCount as a more meaningful alias.
Input must come from BED files with:
As with my previous stacked bar graph, I'll submit a pull request if this is of interest to the group.
Thanks,
roger kramer, bioinformatician
University of Eastern Finland
The text was updated successfully, but these errors were encountered: