- Minor: Consider using
cli
for messaging, for consistency with tidyverse.
-
General goal is to reduce label overlaps in Cnet plots
-
Many options, including considering
ggraph
ottidygraph
, however both require re-creating pie/jampie graph node shapes that display both fill colors and border outline (without hiding adjacent borders). These strategies would not respect edge bundling, so that would either be lost or would require porting to ggplot ecosystem somehow. Sigh. Then useggrepel
for non-overlapping labels. -
Ideal world: There exists some
grid
tool for non-overlapping labels, and notggplot2
. -
A simpler option is to allow direct x/y coordinate label adjustment. Current approach requires angle and distance from node center. Difficult to move one label "up and left a tiny amount". If given ability to adjust a label, it could be possible to adjust all labels to reduce overlaps. Potential workflow:
-
Define label coordinates after calculating angle and distance from node center per usual.
-
Apply adjustments.
-
Apply user-provided adjustments. Easiest approach. Default nudge matrix is c(0, 0) for all nodes. Edit for specific nodes, relative to plot layout coordinates.
-
Future: Automated adjustments.
- Use
strwidth()
andstrheight()
to define bounding boxes. - Adjust bounding boxes?
- The
plotrix
approach performs radial search in local space. It works best when labels are placed in smart order, e.g. center in a cluster, then working out toward edges. Define clusters? Define center of each cluster. Sort labels by distance from cluster center. - Force-directed: calc overlap of two labels, center of overlap, center of bounding box, calculate angle, relative "strength", and repel along the opposite angle.
- Use
-
-
-
fixSetLabels()
- Define better default replacements.
- Consider using default from/to, with option for user-defined additions.
-
Consider using
mem_enrichment_heatmap()
as the top_annotation formem_gene_path_heatmap()
, including for example the dotplot format.- Easiest prototype might be to use
+
to display the gene-path heatmap rotated withrotate_heatmap=TRUE
beside the enrichment heatmap,then merge the color legends:
ComplexHeatmap::draw(mpf$enrichment_hm + mpf$gp_hm, annotation_legend_list=attr(mpf$enrichment_hm, "annotation_legend_list"), merge_legends=TRUE)
- Easiest prototype might be to use
-
DONE. Fix the
row_title
for the enrichment heatmap when called bymem_plot_folio()
, currently it uses numbers instead ofLETTERS
. -
DONE. Consider option for fixed-attribute cells for
mem_enrichment_heatmap()
when used with dot plot, so each cell is square with the circle centered. -
Consider returning updated
igraph
fromjam_igraph()
- Currently returns
invisible(NULL)
. - For example, after applying all the updates to
igraph
values such as node size, label size, label distance, etc. - The end goal is to be able to call
jam_igraph(output)
with no additional arguments and have it (mostly) render the identical figure.
- Currently returns
-
Add unit tests.
- Cover all basic functionality.
- Examples and vignettes should already cover the core workflow.
-
Consider adjustments to
make_point_hull()
- font size
- distance from hull
-
Big picture: Consider using
venndir::JamPolygon
instead ofsf
polygons.- Would calculate offsets, rescale, transform, using
JamPolygon
functions. - Biggest benefit is to reduce dependencies,
sf
is heavy.
- Would calculate offsets, rescale, transform, using
-
mem_gene_path_heatmap()
- Changed default caption to use
ComplexHeatmap::Legend()
format so it can be included along with color legends. - Increased caption font, it was not legible.
- Moved the
gene rows, set columns
labels to the top, split in two rows. - Return added attribute
"caption_legendlist"
.
- Changed default caption to use
-
apply_cnet_direction()
- Change default logic to use
pie.border
always instead of switching toframe.color
when all borders are identical. I think this logic should belong in the plotting function to decide how to render nodes,jam_igraph()
Also more convenient when reviewing the data, user should only have to look atpie.border
and not combination ofpie.border
andframe.color
.
- Change default logic to use
-
mem_enrichment_heatmap()
- Consider adding
top_annotation
using thecolorV
colors per enrichment, for consistency withmem_gene_path_heatmap()
and Cnet plots.
- Consider adding
-
DONE. Add
add_pathway_direction()
- DONE. Helper function to add directional z-score column to
enrichResult
data, using the formula from QIAGEN IPA:z <- (N_genes_up - N_genes_down) / sqrt(N_genes_up + N_genes_down)
- DONE. Helper function to add directional z-score column to
-
Call
add_pathway_direction()
frommultiEnrichMap()
when appropriate.geneHitList
orgeneHitIM
are provided, and- contain positive and negative values, and
"z-score"
is not already defined in eachenrichResult
object. (Bonus points for checking the direction colname attribute.)
-
Review
clusterProfiler::compareClusterResult-class
object definition. For exampleclusterProfiler::merge_result(list(enrichResults))
.- Consider some form of integration, if possible, for example conversion
to/from to call similar functions in
clusterProfiler
.
- Consider some form of integration, if possible, for example conversion
to/from to call similar functions in
-
S4 object
Mem
to replacelist
returned bymultiEnrichMap()
-
IT IS COMING
-
Consider
multienrichjam()
to replacemultiEnrichmap()
? -
slots:
- geneIM (im, direction, colors):
matrix
objects - enrichIM, (pvalues, direction, geneCount, colors):
matrix
objects - memIM:
matrix
object - enrichList:
list
ofenrichResult
objects - colorV:
character
vector of colors per enrichment - colnames:
character
column assignment (consider omitting to enforce standardized colnames)
- geneIM (im, direction, colors):
-
methods:
mem_plot_folio()
, supporting functions:mem_gene_path_heatmap()
,mem_enrichment_heatmap()
,mem2cnet()
enrichList()
- accessor formem@enrichList
mem2dfs()
- create series ofdata.frame
summarizing content, intended for export to Excel xlsx.mem2xlsx()
- direct export to Excel xlsx, callingmem2dfs()
.
-
behaviors
multiEnrichMap()
returnsMem
object instead oflist
mem_plot_folio()
may store parameters in theMem
objectmem_gene_path_heatmap()
,mem_enrichment_heatmap()
could also store/retrieve parameters from themem
input object.mem_plot_folio()
optionally stores plots intomem
to maintian consistent plot attributes
-
-
Consider
Cnet
object that inheritsigraph
?- It behaves as an
igraph
object except its class is helpful for generic functions:plot.Cnet()
,layout.Cnet()
,relayout.Cnet()
- Unclear if S3 object type is preferred since it could inherit
igraph
(S3 object) characteristics.
- It behaves as an
-
Consider
subgraph.Cnet()
orsubgraph.igraph()
functions- Main purpose is to subset the layout as well as nodes.
- Remove all
require()
checks, since they should already be in Dependencies.
-
Remove
gsubs()
which causes a warning upon loadingmultienrichjam
. It conflicts withjamba::gsubs()
. They have slightly different logic. -
mem_plot_folio()
- Enrichment heatmap should define
row_title
to matchpathway_column_title
,LETTERS
by default. It currently shows numbers.
- Enrichment heatmap should define
-
mem_plot_folio()
- option to support RMarkdown output- Provide optional wrapper for RMarkdown output, specifically to print headings/tabs for each plot produced.
- Slight downside is there isn't an easy way to configure a unique
figure size for each plot, so plot sizes are at the mercy of the
Rmd chunk options
fig.height
,fig.width
. - One implementation option is to allow
hook_preplot
andhook_postplot
to allow the user to run a custom function before and after each plot is drawn. That feels too complicated, when the main driver is just to print an RMarkdown header. - Investigate whether each Markdown tab can define a new figure size.
-
importIPAenrichment()
-
DONE. Consider handling gene identifiers so that the default behavior refers to each enriched gene by the original input gene symbol, and not the IPA-generated gene symbol. Some considerations:
-
The main driver is to associate pathway genes to the input data, using the same identifier as the input data. Sometimes IPA assigns its own name which will not match the input data.
-
We may consider that data integration (comparison across enrichments) may perform better by comparing via IPA Symbol. Consider the example
"HSPA1A/HSPA1B"
, one experiment may find"HSPA1A"
significant, another may find"HSPA1B"
significant. According to IPA, these results are equivalent, in which case the IPA symbol"HSPA1A/HSPA1B"
would allow them to use the same identifier. -
Sometimes the "user input" is a platform ID, such as Affymetrix probe. In this case may not be preferable to use the "user input". In this case it may be convenient for data integration, but less convenient when trying to recognize gene symbols as labels.
-
This step requires
"Analysis Ready Molecules"
is available, and follows expected convention used by Ingenuity. -
When multiple genes are combined to "one entity" by IPA, and only one input symbol is retained in the
"Analysis Ready Molecules"
. The driving use case:"HSPA1A"
and"HSPA1B"
are combined to one gene entry"HSPA1A/HSPA1B"
by IPA. They are considered one gene for the purpose of enrichment. If one or both genes were significant, they would appear as"HSPA1A/HSPA1B"
by IPA. The"Analysis Ready Molecules"
will list one symbol"HSPA1B"
as exemplar, and no entry will appear for"HSPA1A"
. In this case there are two options:- Use only the IPA assigned input symbol as provided:
"HSPA1B"
. IMPLEMENTED withrevert_ipa_xref=TRUE
(new default). - Split the IPA multi-symbol label into component parts
"HSPA1A"
and"HSPA1B"
. In this case, we need the user to supply the actual gene hits, so we retain only the gene hit the user provided. NOT IMPLEMENTED. - Leave the entry as
"HSPA1A/HSPA1B"
, however this symbol will not match any input gene hit list, or other expression data matrix. IMPLEMENTED withrevert_ipa_xref=FALSE
.
- Use only the IPA assigned input symbol as provided:
-
-
Consider retaining the header section with analysis details, at least for text input.
-
-
multiEnrichMap()
- Sometimes the gene rows in
geneIM
do not match gene rows inmemIM
, causing an error downstream. The problem appears to happen when the gene hit list does not match all entries inmemIM
, causinggeneIM
to have fewer rows. - One solution is to reduce
memIM
- not ideal because we do not want to lose data. - Another option is to expand
geneIM
- which requires inferring the incidence matrix. For directional data, it would impose1
regardless of the intended directionality, since no other source is available.
- Sometimes the gene rows in
-
edge bundling
- Between two communities currently calculates the "central path" between the group centroids. Consider calculating the "central path" between the points with edges between the two communities.
-
igraph
adjustment scripting language?-
Combines:
nudge_igraph_nodes()
: Fuscobac::x:0.01:y:-0.02adjust_cnet_nodeset()
: nodeset:A::degrees:45:x0.1:y-0.05:percent_spacing:7apply_nodeset_spacing()
: nodeset:B::percent_spacing:7
-
-
adjust_cnet_nodeset()
,apply_nodeset_spacing()
- Debug issue where
label.dist
andlabel.degrees
are defined for subset of nodes, leaving other node attributesNA
- which causes an error injam_igraph()
. Occurs when only one nodeset is adjusted. - Debug issue where
percent_spacing
fails inadjust_cnet_nodeset()
when supplying custom nodegroups from community detection, and are not proper Cnet nodesets.
- Debug issue where
-
mem_gene_path_heatmap()
- lower priority but could be useful-
Consider interactive view (plotly? InteractiveComplexHeatmap?) to enable hover text with the enrichment P-value, gene count, z-score. Could be particularly useful with directional output.
-
Consider optionally labeling the significant dots?
- P-value
- z-score
- number of genes.
-
-
mem_legend()
- lower priority, eventually necessary- Consider using
ComplexHeatmap::Legend()
for consistency with other legends, and to allow combining multiple legends together.
- Consider using
-
S4 object
mem
(orMEM
?) - higher priority - sooner the better-
streamlined data content:
-
geneIM (im, direction, colors):
matrix
objects -
enrichIM, (pvalues, direction, geneCount, colors):
matrix
objects -
memIM:
matrix
object -
enrichList:
list
ofenrichResult
objects -
colorV:
character
vector of colors per enrichment -
colnames:
character
column assignment (consider omitting to enforce standardized colnames) -
omit:
multiEnrichMap
- in favor ofmem_plot_folio()
,memIM2cnet()
-
omit:
multiCnetPlot
- in favor ofmem_multienrichplot()
-
optional: store output from
mem_plot_folio()
to keep a series of plots coordinated, using the same options:pathway_column_split
,gene_row_split
,enrich_im_weight
,gene_im_weight
,column_method
,row_method
(renamecolumn_method
topathway_method
?) -
optional
enrichment_hm
:Heatmap
output frommem_enrichment_heatmap()
?
-
-
methods:
mem_plot_folio()
, supporting functions:mem_gene_path_heatmap()
,mem_enrichment_heatmap()
,mem2cnet()
enrichList()
- accessor formem@enrichList
mem2dfs()
- create series ofdata.frame
summarizing content, intended for export to Excel xlsx.mem2xlsx()
- direct export to Excel xlsx, callingmem2dfs()
.
-
behaviors
multiEnrichMap()
createsMEM
object instead oflist
by defaultmem_gene_path_heatmap()
,mem_enrichment_heatmap()
could also store/retrieve parameters from themem
input object.mem_plot_folio()
optionally stores plots intomem
to maintian consistent plot attributes
-
-
mem_plot_folio()
-
argument
do_which
- consider acceptingcharacter
string terms, e.g."enrichment_hm"
,"gp_hm"
,"cnet_collapse_set"
-
Consider new argument
clusters_mem
for these uses:- allow user-defined pathway clusters
- allow user-defined pathway subsets (missing pathways are dropped)
-
-
collapse_mem_clusters()
- When provided
mpf$clusters_mem
aslist
it may result in singlet genes not connected to any pathways - these should (by default) be removed.
- When provided
-
jam_igraph()
- debug- Apparently sometimes with singlet gene nodes it produces an error:
"Error in FUN(X[[i]], ...) : !anyNA(x) is not TRUE"
traceback pointed to this line:sf::st_polygon(list(polym)) at jamenrich-igraphshapes.R#1635
- Apparently sometimes with singlet gene nodes it produces an error:
-
mem_gene_path_heatmap()
-
Consider option to place the caption elsewhere.
- Problem: The caption sometimes covers part of the color legend in the bottom-right corner.
- Another workaround might be to customize the legend layout, so it is not blocked by the caption.
-
-
mem_plot_folio()
,mem_gene_path_heatmap()
-
Consider a workflow to merge pathway clusters, to allow flexibility in how pathway clusters are defined.
- Problem: Pathway clusters are sometimes defined inconsistently,
where clusters
"A"
and"B"
might be nearly identical. - Problem: It might be visually apparent how to sub-divide pathways, the user may need a mechanism to define pathways to specific clusters.
- Problem: Pathway clusters are sometimes defined inconsistently,
where clusters
-
-
mem_gene_path_heatmap()
- consider adding column
top_annotation
with pathway directionality, equivalent to theleft_annotation
used for gene directionality. - Add row and column annotation padding by default, to help distinguish the central heatmap from the left and top annotations.
- consider adding column
-
multiEnrichMap()
- when supplied with
geneHitIM
orgeneHitList
, calculate thez-score
using the IPA formula:z <- (N_genes_up - N_genes_down) / sqrt(N_genes_up + N_genes_down)
as described [https://doi.org/10.1093/bioinformatics/btt703], and in their FAQ: IPA FAQ - Statistical Calculations
- when supplied with
-
DONE.
multiEnrichMap()
- DONE.
geneHitIM
andgeneHitList
are not behaving as intended, nor consistently. They should be interchangeable and equivalent. WhengeneHitIM
is supplied, it populatesgeneIMdirection
but notgeneIM
. WhengeneHitList
is supplied, it populatesgeneIM
but notgeneIMdirection
. When either are supplied, they should populate the relevant row ingeneIM
andgeneIMdirection
.
- DONE.
-
Currently it is cumbersome to edit pathway labels. It could be done for the
mem
object itself, however the adjustment might need to be different for different plot outputs: heatmap may not work well using word-wrap, while Cnet plots might work best with word-wrap. Publication figures might need an abbreviated label to save plot space. -
mem_plot_folio()
- Consider argument to enable custom adjustment of pathway labels.
-
mem_gene_path_heatmap()
- DONE. Consider argument
gene_annotations
to enable"geneIM"
,"geneIMdirection"
. - DONE. When
mem$geneIMdirection
is present, include directionality in the gene clustering step. - REWRITTEN ABOVE. Consider option to display pathway z-score
(
mem$pathwayIMdirection
) similar to display ofmem$geneIMdirection
. New argumentpathway_annotations
.
- DONE. Consider argument
-
DONE. Debug issue when rendering edge arrow heads, they look wonky.
-
Remove dependency on
sf
inadjust_polygon_border()
- use
polyclip
package as it is much more lightweight, without requiring RGEOS, LWGEOM, other world globe map-coordinate based libraries which are not easily compiled on all computer systems using R.
- use
-
Consider porting
deconcat_df2()
tojamba
for wider re-use by jam packages that would not otherwise need dependency onmultienrichjam
. -
Consider vectorizing edge arrow size
- Currently all edge arrows must be the same size (the same limitation
is present with
igraph::plot()
). - This is a niche feature, arrows are rarely used in
multienrichjam
, and use of differently-sized arrows does not have a driving use case. (This feature is unlikely to move forward until needed.)
- Currently all edge arrows must be the same size (the same limitation
is present with
-
Consider
ggraph
compatibility in future.- Many R packages use
ggraph
forigraph
plotting, so it might make sense for consistency to offer this feature. The outputggplot
objects can be combined into larger figures usingpatchwork
orcowplot
easier than using base Rplot()
functions. - My preference not to use
ggraph
is mainly because the returnedggplot
object is not a useful network graph, it is only the instructions for visualization. As such, the layout, node customization, is not persistent outside theggplot
object. The packageclusterProfiler
migrated toggraph
and now all the returned objects are not useful because they cannot (easily) be customized and analyzed. - However
ggraph
does not offerpie
norcoloredrectangle
node shapes. - Note that
pie
nodes can be emulated withgeom_pie()
but (1) is painfully slow to render because it is not vectorized, (2) does not use inner borders, which allow adjacent wedge borders to be shown beside each other without overlapping. - Task is to add new
geom
forjampie
node shape that accepts pie wedge border colors drawn as inner border, with optional outer border color that also does not overlap the optional inner border. - Another task is to implement edge bundling using
ggraph
compatible methods. It involves calculating edge bundles, then rendering the curved edges, optionally with arrow heads. - Ultimately a utility function
jam_ggraph()
may be theggraph
equivalent ofjam_igraph()
for plottingigraph
objects.
- Many R packages use
-
Integrate directionality with more steps in the workflow:
-
mem_gene_path_heatmap()
- Consider option to display direction of change in row (gene) annotations for each enrichment. Could display with or without enrichment-colored results?
-
-
Create
bookdown
documentation- Should it use a separate Github repository? (Probably yes.)
It looks like
"jokergoo/circlize_book"
is the repo for the bookdown site.
- Should it use a separate Github repository? (Probably yes.)
It looks like
-
Refactor
multiEnrichMap()
- maybe new functionmultienrichjam()
?-
create
mem
S4 object with slots, print, summary functions-
suggested methods:
plot()
could default tomem_gene_path_heatmap()
to show all dataprint()
could print summary of content: enrichments, pathways, genes- consider
as.data.frame()
- convert to widedata.frame
summary? memIM()
,geneIM()
,enrichIM()
convenient access to slot data- convenient way to get
list
format for IMs?
-
slots:
memIM
: gene/pathway matrixgeneIM
: gene/enrichment matrixenrichIM
: pathway/enrichmentgeneIMdirection
: optional direction per gene/enrichmentgeneIMcolors
: colors assigned per gene/enrichmentenrichIMdirection
: optional direction (z-score) per pathway/enrichmentenrichIMcolors
: colors assigned per pathway/enrichmentenrichIMgeneCount
: integer number of genes per pathway/enrichment
-
-
remove steps that create embedded Cnet and Emap
igraph
objects:multiCnetPlot
,multiCnetPlot1
,multiCnetPlot1b
,multiCnetPlot2
multiEnrichMap
,multiEnrichMap2
multiEnrichDF
- consider savingdata.frame
with clear namemultiEnrichResult
- What content is stored here?
-
-
New object classes:
-
"mem"
: store output frommultiEnrichMap()
-
(Essentially a formal replacement for
list
format used currently.) -
memIM
-
enrichIM, enrichIMcolors, enrichIMdirection
-
geneIM, geneIMcolors, geneIMdirection
-
geneHitList
-
colorV
-
params:
- p_cutoff (from argument
cutoffRowMinP
) - min_count
- topEnrichN
- pvalueColname
- directionColname
- p_cutoff (from argument
-
-
"mem_plots"
: storemem_plot_folio()
data for re-use.
-
-
New wrapper function
multienrich()
to replacemultiEnrichMap()
?- streamlined refactor and replacement for
multiEnrichMap()
- avoids defining Cnet and Emap data, pushing into
mem_plot_folio()
- reduces arguments by removing all visualization arguments
- consider storing
memIM
data asSummarizedExperiment
for convenient use withComplexHeatmap::Heatmap()
viajamses::heatmap_se()
. - returns
mem
object.
- streamlined refactor and replacement for
-
Update
mem_plot_folio()
-
input
mem
object -
add
mem2emap()
plot output. -
consider using
jamses::heatmap_se()
for heatmap functions- it adds
jamses
as dependency, along with its dependencies - it puts pressure to refactor the jamses contrast stats code
- should
jamses::heatmap_se()
be moved to new package focused only on SummarizedExperiment heatmaps? - See Bioconductor package
sechm
which is much less capable, but has the same inspiration. It provides row scaling (ack) as a recommended (!) option. (Problematic, sorry to say. For gene expression data, the magnitude of change is important, and matters. To rescale the numeric range for consistency is counter to the biology, and to the technology. The technology has measurement limitations, for which seeing the actual numeric differences is important for assessing whether changes are reliable from that platform. These differences also transfer into follow-up confirmation assays, where changes below a threshold are not feasible to confirm. In general, gene (transcript or protein) expression fold changes are relatively consistently measured for each gene, so to enhance the apparent fold change of one gene to fit the fold change of another gene is not necessary, certainly not by default.)
- it adds
-
-
Consider new R package for SummarizedExperiment heatmaps
- move
jamses::heatmap_se()
into this package - move
platjam::design2colors()
into this package - minimize R package dependencies
- move
-
R packages to review
- Bioconductor package
"ConsensusClusterPlus"
which can be used to determine appropriatek
values for k-means clustering, with metrics to assess consistency of cluster assignment. "GeneTonic"
functionggs_graph()
produces a Cnet plot which they export tovisIgraph()
for interactivity;enrichment_map()
creates EnrichMap."monaLisa"
- motif enrichment, extends HOMER theme; nice heatmap Motif labels"profileplyr"
- coverage heatmap using tidyverse plyr syntax
- Bioconductor package
-
DONE:
reorder_igraph_nodes()
- When
orderByAspect=TRUE
, and it detects tall-skinny aspect ratio, it appears to be applying the y-axis sorting bottom-to-top instead of top-to-bottom. - The culprit was
spread_igraph_labels()
default argument, changed fromnodeSortBy=c("x", "y")
tonodeSortBy=c("x", "-y")
.
- When
-
mem_gene_path_heatmap()
- Slightly increase the spacing between heatmap body and row/column annotations. Currently the gap between heatmap row/column split is identical to the gap between heatmap and row/column annotations, which makes it harder to distinguish one from the other.
- The same can be accomplished using
ComplexHeatmap::ht_opts()
but the option is hard to remember, and would ideally need to be set back to the previous value after drawing the plot.
-
mem_plot_folio()
-
DONE: The enrichment dot plot (or enrichment heatmap) should be created after the gene-path heatmap is created, in order to define pathway clusters using gene content, rather than defining pathway clusters using the
-log10(Pvalue)
matrix.- The clusters derived from the P-value matrix are sometimes not very similar to the gene-pathway clustering result.
- For this workflow, it makes the most sense to define pathway clusters upfront, then share those pathway clusters with all downstream visualizations.
- Consider creating object class "mem_plots".
-
-
consider new function to convert IPA enrichment data to
geneHitList
- list element
"Analysis Ready Molecules"
is provided in the IPA output, and this data can be used to re-create the directional hit matrix used during import (if directionality such as fold change was provided to IPA).
- list element
-
consider new function to evaluate gene-pathway heatmap output
-
The driving use case is selecting
pathway_column_split=5
upfront, but realizing perhaps 3 clusters would be preferred based upon the content.- Criteria for collapsing two pathway clusters together: either Jaccard similarity above 0.4, or correlation above 0.6.
- Definitely requires more testing to determine appropriate default thresholds, or whether a reasonable data-driven threshold can be defined.
-
sometimes two clusters can and perhaps should be merged together
- Create collapsed incidence matrix,
- Calculate correlation,
- Any two clusters with correlation >= 0.2 could be merged?
- Alternatively, if more than 3 clusters would be merged, cancel to prevent merging too many clusters together.
-
-
consider new function to edit vertex attributes?
-
use case: existing
igraph
object nodes have attributes to modify:- sometimes modify only certain nodes:
nodeType="Set"
to edit labels - attributes in atomic vector form
- attributes (
pie.color
,pie.border
) inlist
form - also sometimes want to adjust colors - probably separate function
- function returns
igraph
object with attribute modified and stored in the same state as present originally (e.g. atomic remains atomic; list remains list).
- sometimes modify only certain nodes:
-
Which format seems most reasonable?
gsub_vertex(g, pattern_l=list( nodeType=c(Set="^(WP|KEGG|BIOCARTA|GO|REACTOME)_")), replacement_l=list( nodeType=c(Set="")))
* ``` gsub_vertex(g, subset_attr="nodeType", subset_attr_value="Set", pattern="^(WP|KEGG|BIOCARTA|GO|REACTOME)_", replacement="")
gsub_vertex(g, subset_attr_l=list(nodeType=c("Set")), pattern="^(WP|KEGG|BIOCARTA|GO|REACTOME)_", replacement="")
-
-
-
consider new function to adjust
igraph
node colors in all forms- modify all fill color attributes
color
,pie.color
,coloredrect.color
- modify all border attributes
frame.color
,pie.border
,coloredrect.border
- adjust
frame.width
,pie.lwd
,coloredrect.lwd
relative to each other. For example whenpie.lwd
andpie.border
are both defined (and not transparent),frame.width=0.1
, otherwiseframe.width=2
.
- modify all fill color attributes
-
mem_plot_folio()
- Debug edge cases where
pathway_column_split=4
does not match the output number of column split following hierarchical clustering. - Debug edge case where there are not enough pathways or genes to support the gene-pathway heatmap workflow. E.g. only 1 or 2 pathways, or only 1 or 2 genes.
- Debug edge cases where
-
multiEnrichMap()
-
Debug error "multiEnrichMap(): geneHitIM does not contain 5 rows present in geneIM, default values will use 1."
- Apparently when some
rownames(mem$geneIM)
are not found inrownames(geneHitIM)
. - Ideally change
geneIMdirection
so it does not store "+1" and instead shows zero or NA. Genes should be shown but without associated direction. - The message should describe how to find missing values so the user can debug the error.
- Apparently when some
-
-
reorder_igraph_nodes()
- Currently
orderByAspect=TRUE
will order nodes based upon the aspect ratio of each nodeset. - Ideal world: when nodes are sorted by something like color, calculate the aspect ratio of nodes within that color. Situation: Assume a tall-skinny nodeset, sorted top-to-bottom by different colors. One color has 12 entries, the nodeset is 12 nodes wide, so this color appears horizontal among the other nodes. When sorting by border color, it goes top-to-bottom, which is not visually intuitive.
- Currently
-
apply_cnet_direction()
- DONE: changed default
col
to use colors:c("blue", "grey80", "firebrick3")
with breaksc(-1, 0, 1)
.
- DONE: changed default
-
change all
frame.lwd
toframe.width
beforeframe.lwd
is widely used.- consider backward compatibility: when
frame.lwd
is defined in anigraph
object, copy its values intoframe.width
, then proceed usingframe.width
for all other operations.
- consider backward compatibility: when
-
mem_legend()
- DONE: new argument
pt.lwd=2
to control the line width used only for point borders, useful whendo_direction=TRUE
. The argumentpt.lwd
already gets passed tolegend()
however making it a formal argument here helps make the option more clear for users. - Auto-detect whether to enable
do_direction=TRUE
, by checking if anyframe.color
orpie.border
are defined with red/blue colors. Bonus points for using the same colors defined inframe.color
orpie.border
, however that may be risky if those colors vary based upon some fold change value, or vary based upon contrasting with the node or pie fill color.
- DONE: new argument
-
jam_igraph()
-
Consider handling
V(g)$shape="circle"
as shape"jampie"
during rendering. Certain older Cnet plotigraph
objects appear to break the defaultshape="circle"
rendering, some cryptic error aboutnames()
not being defined when expected. The error does not appear withigraph::plot.igraph()
, so it is specific tojam_igraph()
.- Possible workaround is to sidestep the problem by re-using the
same
shape="jampie"
rendering method already implemented, which would keep all borders consistent when displaying a mixture ofshape="circle"
andshape="jampie"
nodes. - Problem with that workaround, if a node has
pie.color
defined it will be used whenshape="jampie"
even if the user specifiedshape="circle"
(which should only usecolor
). In that case, when ashape="circle"
node is being rendered internally asshape="jampie"
it should first copycolor
intopie.color
for those nodes;frame.color
topie.border
; andframe.lwd
(frame.width
) topie.lwd
.
- Possible workaround is to sidestep the problem by re-using the
same
-
-
DONE: Fix bug with node rendering, caused by recent version of
igraph
addingvertex.frame.width
(and notvertex.frame.lwd
ugh).NULL
or missingvertex.frame.width
causes an error. Ultimately caused by no default value defined in the custom functiondefault_igraph_values()
, which was necessary to create sinceigraph
does not export that function.- FIXED by adding
vertex.frame.width
to default values. - Longer term fix is to replace all references to
vertex.frame.lwd
withvertex.frame.width
, before the precedent is set.
-
Fix errors caused by
"stringsAsFactors=TRUE"
- DONE:
rank_mem_clusters()
- DONE:
-
Fix errors caused when there is only one (or zero) genes.
mem_gene_path_heatmap()
andmem_plot_folio()
-
mem_enrichment_heatmap()
color legend changes:- Show actual P-value
c(1, 0.05, 0.01, 0.001, 10^-4, etc.)
usingexpression
for labels, and continue using-log10(p)
for color assignment. - Add this argument
heatmap_legend_param=list(break_dist=1)
which causes the numeric labels to be evenly spaced, instead of having the labels at uneven intervals, often with angled lines connecting to the color legend. - Option for discrete color legend? I.e. Show colors only at the labels,
and not show the intervening gradient. It is more difficult to show
abrupt transitions, e.g. it would need to show
c(1, 0.051, 0.05, 0.01)
in order to show that0.051
is not colorized, but0.05
is colorized. The smooth gradient is actually more effective at conveying that effect without additional labels.
- Show actual P-value
-
Nodes with
shape="jampie"
andframe.lwd=0
are still rendering the frame color outside the inner borders. Whenframe.lwd=0
there should be no frame drawn even whenframe.color
is defined with a color. -
Big picture musing: Consider replacing base R plotting functions with corresponding
grid
functions.- The
vwline
R package (P. Murrell) is capable of drawing internal/external lines. - The
gridGraphics
package also provides better methods of clipping curved lines to the edge of a node border for example. - Major downside, it would likely involve rewriting all the
igraph
node shape functions into correspondinggrid
format. It is effectively similar to repeating much ofggraph
, except that this approach can be customized. Theggraph
approach inggplot2
is more or less untouchable in terms of providing customization. Ugh. - Another option may be to figure out how to add custom node shapes
to
ggraph
, then usevwline
/gridGraphics
for rendering. Also need to write custom edge bundling function, since those inggraph
are wholly insufficient. Yeah, this is a no for now.
- The
-
reorder_igraph_nodes()
,reorderIgraphNodes()
- DONE: method to specify specific nodes or nodesets to be reordered
- motivation is to allow sorting based upon relative aspect ratio of nodes, so a nodeset whose nodes are "tall-skinny" can be sorted top-to-bottom, and nodeset whose nodes are "short-wide" can be sorted left-to-right. Frankly, not sure if the inconsistency works for all network layouts, but for sure the top-to-bottom is not ideal for "tall-skinny" nodesets, it is not visually intuitive.
-
consider new igraph shapes, intended to enable inner/outer border,
frame.lwd
- Do these make sense?
shape.jamcircle.plot()
- enable customframe.lwd
for shape="circle"- others: square, csquare, rectangle, vrectangle
- Do these make sense?
-
label_communities()
- generalize this method to determine keywords most represented in any set of pathway names.
-
Low priority visual enhancement, color Cnet edges by Set.
- Consider coloring Cnet edges by categorical color, by Set It may help clarify node groupings, while also reinforcing which edges connect to particular nodesets. Unclear whether added color would be confusing or beneficial.
- Referring to Neely et al paper in ACR Open Rheumatology: "Gene Expression Meta-Analysis Reveals Concordance in Gene Activation, Pathway and Cell-Type Enrichment in Dermatomyositis (DM) Target Tissues"
- They show Cnet-type plot in Figure 4A, where connections from each set were categorically colored
-
Low priority: It may be useful to create vectorized functions:
-
polygon()
- enable multiple line widths for multiple polygons
split by
NA
coordinates. - enable optional inner and outer borders with varying widths
- enable multiple line widths for multiple polygons
split by
-
text()
: enable multiple family, srt -
lines()
: enable multiple col, lty, lwd for split lines, similar tosegments()
except enabled for multiple lines split byNA
coordinates.
-
-
jam_igraph()
- Return the input
igraph
object with all relevant object attributes updated to reflect the plot parameters. The returned object could be plotted directly without any customization. - Consider storing/using edge coordinates inside the
igraph
object. However, whenever layout is re-calculated, edge coordinates would likely become invalid. It would be tricky to handle.
- Return the input
-
edge bundling and edge clipping integration
-
DONE: Most scenarios described below.
-
Currently, linear edges are clipped at connected node boundaries.
-
When edges are slightly curved, the start and end positions are reasonable.
-
When edges are bundled, and especially when nodes are relatively large, the edge curves through the node in a different direction than a linear edge.
-
This situation is not a visualization problem when:
- nodes are not filled with transparent color, and
- edges are not drawn with arrows.
-
The situation is only a visible problem when:
- edges have arrows that would now be partly or fully covered by the node, or
- nodes that are filled with partial or fully transparent color, thus showing the edge underneath.
-
Proper edge clipping would probably be done by calculating the edge from the node actual center point, then clipping edge where it exits the node shape border.
-
Implementation may benefit from storing the edge coordinates as an edge attribute, to be used by the clipping function when present. Absence of custom edge coordinates would cause the clipping function to use linear edge coordinates.
- The plot function could also use stored edge coordinates as opposed to calling edge bundling function; alternatively the edge bundling function could simply re-use existing edge coordinates as well.
-
-
jam_igraph()
, proposed drop-in replacement forigraph:::plot.igraph()
:- FIXED: This function does not seem to handle edge arrows, nor does it shorten edges based upon node sizes.
- bonus points for node/edge legend functions
- When layout is not defined, the xlim/ylim values sometimes do not match the dynamic layout calculated on the fly.
-
igraph shape "ellipse"
- DONE: it should have a proper "clip" function, in order for edge arrows to appear at the border of each node
-
igraph "coloredrectangle" shape
- DONE: The coloredrect.border should also be capable of adjacent lines that do not overlap.
-
reorder_igraph_nodes()
- DONE: to be fancy, it should also propagate changes to
label.dist
andlabel.degree
, as created byspread_igraph_labels()
, since switching coordinates for two nodes should also switch thelabel.degree
andlabel.dist
associated with those coordinates.
- DONE: to be fancy, it should also propagate changes to
-
jam_igraph()
- It shows some lag before plotting all nodes vectorized, check
if the edge bundling is the slow step, and optimize as needed.
UPDATE: Yes the edge bundling step is introducing lag; or the
resulting curved lines are being plotted slowly?
Another good reason to store edge bundle coordinates in the
igraph
.
- It shows some lag before plotting all nodes vectorized, check
if the edge bundling is the slow step, and optimize as needed.
UPDATE: Yes the edge bundling step is introducing lag; or the
resulting curved lines are being plotted slowly?
Another good reason to store edge bundle coordinates in the
-
mem_legend()
- consider using
ComplexHeatmap::Legend()
for consistency with future Cnet-Heatmap usage, and because that mechanism is really nice.
- consider using
-
Refactor
multiEnrichMap()
-
likely create new function
multienrich()
somultiEnrichMap()
can be deprecated and remain for backward compatibility. -
Avoid pre-calculating Cnet and Emap graphs. Counterpoint: It could call
mem_plot_folio()
with defaults, and store results back into themultienrichResult
. -
Consider
multienrichResult
object type?- could hold what is now a
list
object, with proper slot names. - custom
print()
/summary()
function to display summary info about number of genes, pathways, etc.
- could hold what is now a
-
Consider
multienrichPlot
object type? No, the decision is to re-usemultienrichResult
.- Benefit is simplicity: this object feeds all downstream plots.
- The negative is that it requires saving a separate
multienrichResult
with alternative clustering if necessary. - Main goal is to define the gene-pathway clustering,
then re-use clusters in subsequent steps without having to repeat
the clustering the same way. If one uses custom clustering
in
mem_plot_folio()
to produce gene-pathway heatmap, they have to use the same arguments in subsequent steps otherwise the clusters will differ, and it is not clearly indicated to be a problem.
-
mem_plot_folio()
may likely return themultienrichResult
after updating internally stored data. It should have option to use existing results.
-
-
Write a more directed vignette showing at least two common use cases:
- Enrichment using
clusterProfiler::enricher()
, then multi-enrichment. - Enrichment using Qiagen Ingenuity IPA (outside of R) then importing
the files produced using
"Export All"
from within the Ingenuity IPA app. - Enrichment using some other external (non-R) tool, for example DAVID.
- Enrichment using
-
Write a vignette focused solely on Cnet plot custom layout options. Background:
-
Very common workflow results in Cnet plot to summarize the findings.
-
This Cnet plot has often been included as a figure or supplementary figure for a paper, therefore it requires manual adjustments to increase legibility and clarity of the figure.
-
Adjust individual nodes:
nudge_igraph_node()
. Note this function can be called on alist
of nodes usingnodes_xy
, or using vectors withnodes
,x
,y
. -
Adjust sets of nodes:
adjust_cnet_nodeset()
- Usually for sub-clusters of gene nodes, move the whole group, adjust intra-group node spacing, or rotate the group around the group center.apply_nodeset_spacing()
- Apply a minimum spacing between nodes, for each nodeset.adjust_cnet_set_relayout_gene()
- Adjust "Set" nodes manually, then fix them in place but allow "Gene" nodes to move during re-layout, usually withrelayout_with_qfr()
-
Adjust all nodes:
layout_with_qfr()
,layout_with_qfrf()
,relayout_with_qfr()
- All are wrappers toqgraph::qgraph.layout.fruchtermanreingold()
with convenient defaults.layout_with_qfr()
returns coordinates;layout_with_qfrf()
returns a layout function with user-definedrepulse
argument;relayout_with_qfr()
updates layout in-place for anigraph
object, storing ingraph_attr(g, "layout")
.rotate_igraph_layout()
- rotate the layout coordinates by some user-defined degree angle.spread_igraph_labels()
- positions node labels radially around nodes, based upon the average incoming edge angle.reorder_igraph_nodes()
- within each nodeset, reposition nodes in order of node color, border color, then label. A nodeset is defined as a set of nodes that all share the same connections, which is mostly only useful for bipartite graphs such as Cnet plots. This function performs the node re-ordering for all nodesets, across the wholeigraph
object.
-
plot with
jam_igraph()
- Vectorized plotting when multiple shapes are used, otherwise
igraph::plot()
uses afor()
to iterate each node. - Improved
pie
rendering, also vectorized. - Convenience methods to adjust node size, node label font size, node label distance
- Optional shadow text node labels
- Maintain aspect ratio = 1, so nodes are symmetrically spaced along each axis (defined by the layout algorithm used.)
- optionally render node groups using
mark.groups
- Call edge bundling, especially useful for bipartite graphs such as Cnet plots.
- Optionally draw background grid with percent layout units.
- Vectorized plotting when multiple shapes are used, otherwise
-
-
add dev functions
layout_cnet()
and related iterative layout functions.
-
mechanism to store edge coordinates in
igraph
object-
to my knowledge, this functionality does not exist in
igraph
, nor is it represented inggraph
ortidygraph
objects. Howevertidygraph
may have capability to supply specific edge coordinates if they exist, so it might be the closest to implementing this feature. -
igraph::plot()
is not equipped to use edge coordinates -
ggraph
is not equipped to use edge coordinates, it only creates edge coordinates based upon the edgegeom_
being used. -
The driving use case is to define edge coordinates to handle:
- edge bundling, a procedure that could be done dynamically, but is computationally expensive for large numbers of nodes;
- custom edge pathing, specifically for pathway schematics, such as those generated when using GraphViz DOT format. While the DOT format generates and stores edge coordinates, I could not find examples in R that import a fully-described DOT file (with edge coordinates embedded) that also imported edge coordinates.
-
Reasons to want edge coordinates upfront:
- pre-calculate edge bundling, saving time during rendering
- allow custom definition of edges, for example in a pathway schematic where edges are positioned to avoid overlaps.
- calculate better label placement by using the angle of incoming edges, not limited to the linear vector from node1 to node2.
-
Issues raised when storing edge coordinates:
- Obviously, whenever the node layout coordinates are changed, the edges also need to be changed.
rotate_igraph_layout()
could rotate nodes and all edges together.nudge_igraph_node()
andadjust_cnet_nodeset()
must decide whether to adjust edges by simple scaling, or simply invalidate all edges, then force the user to re-calculate edge coordinates.
-
useful helper functions
validate_edge_coords()
- Test whether edge coordinates match node coordinates. If not, then delete or replace edges with linear equivalent.adjust_edge()
- Wrapper function to adjust the curvature, placement, rotation, of edges. Could be called when rotating node layout, to rotate edges accordingly.bundle_edges()
- Wrapper to apply bundling to one set of edges together as a group. Optionally define specific coordinate(s) through which the bundling loess curve is routed.- fancy options like routing edges with preference for vertical/horizontal pathing, with slight curvature at right angle turns. Often used in schematic diagrams.
- fancy "subway" style options, where bundled edges are allowed to remain visible adjacent to other edges along their path. Edges could "snap" to nearby edges in the same bundle.
-
-
get_cnet_nodeset()
- This function is called several times by internal functions, and could therefore be much faster than currently.
- Refactor by using
igraph::as_adjacency_matrix()
. Subset rows fornodeType="Gene"
and columns fornodeType="Set"
. Then should be able to convert rapidly byvenndir::im2list()
or somepasteByRow()
magic, produce nodeset per node (row). Finally, split node names by nodeset.
-
node adjustment to prevent label overlaps
- Idea: "stretch" out nodeset nodes "to the right", which fixes the left edge of the nodeset, then expands the node spacing outward as fraction of current range of nodeset nodes. For example, expand to the right by 10%, to improve side-to-side spacing, since label overlaps typically occur with nodes at the same y-level.
- Stretch nodes "to the left" would work the same, but fixes the right x-coordinate range, and expands the left x-coordinate range.
- Stretch nodes "to the top", and "to the bottom" work similarly.
- Future idea to reduce node label overlap is to treat it like biscuit dough under a rolling pin. Stretch subset of nodes in each direction until the labels no longer overlap. The trick is to stretch nodes away from other nodesets, so it does not cause new overlaps with other nodes.
-
adjust_cnet_nodeset()
- Consider option to restrict "expand" to x- or y-axis expansion. Basic idea is to limit expansion to "widen" the node spacing, or to make node spacing "taller". The "widen" option is helpful to reduce label overlaps for nodes directly beside each other.
-
spread_igraph_labels()
- ideal case: somehow take into account the edge bundling to calculate input angle to each node, rather than straight vector from node to node.
node_groups
- spread labels relative to node group centroid, so labels in this cluster of nodes will be spaced out from each other. Bonus points for taking into account the overall average input angle to nodes in each group, and applying a fraction of that offset along with the node-to-group offset. For example, for a node group in the top right, they generally point to the top-right, but are fanned out slightly so the bottom-left-most node is not fanned out to bottom-left, but maybe center, or only slightly bottom-left of the node.
-
new layout functions specific for Cnet plots
-
iterate_qfr_layout()
- R code version ofqgraph::qgraph.layout.fruchtermanreingold()
with custom addition of node "shells", and option to calliterate_node_group_distance()
-
iterate_node_group_distance()
- R layout intended only to enforce separation across node groups (defined byget_cnet_nodeset()
) so there is additional space between nodesets in the layout. -
layout_cnet()
- wrapper function that calls rounds of layouts. This series of steps is currently the best default layout to generate the most readable Cnet plot possible.- initial node placement - qfr layout without node/group distance
- node spacing - qfr with node distance
- node/group spacing - qfr with node and group distance 4a. imposed nodeset percent spacing
- group spacing - "fix" node layout per nodeset, then layout nodesets
- detect best rotation to place first "Set" node top-left, then rotate
- spread labels, re-order nodes by color/border.
-
Other ideas:
- Try the new
bubble_force()
calculation for minimum distance, instead of the linear desired distance force. - Consider constraining all Gene nodes, then allow only Set nodes to move. Goal is to prevent Set nodes from being embedded inside Gene nodes due to overall net forces.
- Optional fixed coordinate range, to prevent layouts from becoming infinitely large, thus endless cycle of imposing minimum precent spacing, making the range larger, thus making the spacing smaller, etc. etc. It means when a force would push a node/group out of bounds, it gets stopped at the boundary/boundaries. Hopefully because forces are applied to both node1 and node2, when node1 cannot move, node2 should still move albeit at half speed.
- Really pie in the sky: Use the node size and shape to calculate distance between nodes, nearest polygon distance for example. Probably also a processing non-starter since it would add overhead to the calculations, however it is the type of thing that could be parallelized during each iteration. No idea how efficient it would be to split threads during each iteration.
- Try the new
-
Code it in Rust, use R package
expandr
to integrate the Rust function into the package via an R function. Follow C++ code steps used byqgraph:::qgraph_layout_Cpp()
. It doesn't seem that these layout iterations are particularly good for multi-threading, however.- each iteration could be multi-threaded. Threads could share one distance matrix, the extract values when needed. Or threads could share node coordinates, then calculate distance when needed on the fly. If fast enough, distance matrix (and memory allocation) could be avoided, just do the math when needed.
- Look for Rust libraries for geometric calculations.
-
-
mem_gene_path_heatmap()
- add option to display gene incidence matrix (left annotation)
using
geneIMdirection
colors, to represent up/down.
- add option to display gene incidence matrix (left annotation)
using
-
add
plot_cnet_heatmaps()
- make default arguments as minimal and close to default settings as possible
-
adjust_cnet_nodeset()
- COMPLETE: add minimum percent spacing, similar to
apply_nodeset_spacing()
except to operate on only one nodeset at a time.
- COMPLETE: add minimum percent spacing, similar to
-
shape.jampie.plot()
which callsjam_mypie()
:-
When drawing borders for multiple pie shapes, the overlapping border are overdrawn, hiding all but the last border drawn.
-
Instead, draw border inside the polygon edge, so the borders are adjacent and not overlapping.
-
Quick survey revealed no drop-in replacement
polygon()
functions. -
Workaround involves
sf::st_buffer()
to calculate a buffer inside each polygon, creating a "donut" filled with the border color. In principle, this issue tends to occur in relatively few nodes for typical Cnet plots, however it could be substantial performance hit.- Each polygon, convert to corresponding
sf::st_polygon
object - Call
sf::st_buffer()
. - Create polygon from original border, inner buffer border. Or split the original polygon at this border, apply color to the outer and inner polygons. This way the border and fill colors are drawn together so nodes are "complete".
- Each polygon, convert to corresponding
-
-
multiEnrichMap()
- accept
p_cutoff
and deprecatecutoffRowMinP
for future removal.p_cutoff
is used in other package functions, as ismin_count
.
- accept
-
mem_enrichment_heatmap()
- problem:
style="dotplot"
andstyle="heatmap"
have very different visual effects, the dotplot de-emphasizes significance in favor of gene count (point size) - smaller gene count hides significance. The pale bivalent colors are very close to white, while"Reds"
color gradient is much more distinctive. - Can try transforming point size, so the minimum size starts out larger?
Arguments
point_size_min=3
andpoint_size_max=10
may help? - Another more dramatic option is to create normal heatmap, then just
draw points on top of the already-filled cells. Note the points would
be inside the boxes, instead of connected by lines through the center of
each box. Could test with
style="hybrid"
and implement both.
- problem:
-
S4 objects?
-
mem
:- Idea is to have one object type to handle output from
multiEnrichMap()
- benefit is mostly to have default behaviors for things like
print()
but also for clarity during the workflow. Themem
object can be clear input to other functions.
- Idea is to have one object type to handle output from
-
cnet
:- inherits
igraph
and extends assumptions, again mainly for clarity as input argument to other functions
- inherits
-
-
mem_gene_path_heatmap()
- option to display dot plot format at the top of heatmaps, equivalent
to calling
mem_enrichment_heatmap()
. Benefit would be to indicate direction (color) and number of genes (size) in addition to P-value (intensity).
- option to display dot plot format at the top of heatmaps, equivalent
to calling
-
fix
reorderIgraphNodes()
-
DONE: add "frame.color":
sortAttributes=c("pie.color", "pie.color.length", "coloredrect.color", "color", "pie.border", "frame.color", "label", "name")
-
future: make sort more intelligent, so it uses appropriate color based upon node shape during the sort.
shape="pie"
: use "pie.color", "pie.border", "frame.color", "label", "name"shape="coloredrect"
: use "coloredrect.color", "coloredrect.border", "frame.color", "label", "name"- all others assumed to be
shape="circle"
or similar: use "color", "frame.color" and ignore "pie.color", "pie.border", "coloredrect.color", "coloredrect.border", "label", "name"
-
-
DONE: Include gene direction in the workflow:
-
DONE: Add
geneIMdirection
to mem object.- Requires
geneHitIM
input tomultiEnrichMap()
. - Alternate backwards compatibility is function to add
geneIMdirection
tomem
object, and/orcnet
igraph
object. - DONE:
memIM2cnet()
should optionally useenrichIMdirection
to apply border color.
- Requires
-
DONE:
mem2cnet()
as its own function, minor variation and extension tomemIM2cnet()
which only uses thememIM
portion of the data. -
jam_igraph()
should recognizelwd
when plotting nodes. This change diverges from default behavior inigraph
which only usespar("lwd")
as a global adjustment for all lines while plotting nodes, therefore does not allow any nodes to have different line widths. -
DONE: Outline of Cnet gene nodes by direction.
-
Gene-Path heatmap rows should optionally indicate direction, possibly another stripe using geneIMdirection.
-
-
multiEnrichMap()
- should call
mem2cnet()
- or just avoid this step altogether since thecnet
process is better done withmem_plot_folio()
- consider removing the
enrichMap
andcnet
steps altogether.
- should call
-
mem_plot_folio()
- FIXED:
mem_enrichment_heatmap()
does not honordo_plot=FALSE
.
- FIXED:
-
mem_enrichment_heatmap()
- DONE: add argument
cluster_rows
to allowcluster_rows=FALSE
. - DONE for manual plot calls: one goal is to draw this heatmap using the order from the gene-pathway heatmap clustering.
- Future idea: allow plotting this data using the same order as the gene-pathway heatmap. This process would require running the gene-pathway clustering first, determining the column order, then using it to order the rows in this heatmap.
- DONE: add argument
-
Now that
jam_igraph()
and node shapes"jampie"
render thepie.border
andframe.color
, which can indicate the direction of change for each gene, in context of each enrichment test, some other changes should be made to the workflow:-
Consider adding
pie.lwd
to recognized attributes inshape.jampie.plot()
, but confirm that this argument can be used when rendering"jampie"
node shapes. Currently the line width can only be adjusted withpar("lwd")
which is a global setting. -
The default
frame.color
(vertex.frame.color
) should probably be set toNA
or"transparent"
so the frame color is not visible for pie nodes. It shows up as a small white line now, which was not visible previously. -
reorderIgraphNodes()
should includepie.border
andframe.color
in sensible default locations, so nodes will also be sorted by these values when relevant, without the user having to add these columns. -
New function to populate
frame.color
andpie.border
for Cnet plots.- When
pie
has only one value, apply color toframe.color
- When
pie
has multiple values, and different directions, apply colors topie.border
- When
pie
has multiple values, and all have the same direction, apply colors toframe.color
- In absence of any directional data, set all
frame.color
to default, and set allpie.border
to"transparent"
orNA
.
- When
-
-
Need to include gene direction of change in the workflow:
- Some easy method to include direction of change in the
multiEnrichMap()
workflow, for example argumentgeneHitList
currently uses acharacter
vector, but could acceptinteger
vector direction with genes stored ascharacter
labels, same as used withvenndir
signed input lists. - When gene direction is available, the
frame.color
and/orpie.border
colors are defined. mem_gene_path_heatmap()
option to represent direction of change in thegene_im
incidence matrix.
- Some easy method to include direction of change in the
-
Edge bundling makes assumptions for bipartite graphs (cnet) that are difficult to use with normal graphs.
edge_bundle_nodegroups()
probably needs to subset edges involved in each nodegroup before bundling all edges connected to these nodes. Currently, nodes are assumed to share all connections, but the effect should occur for edges where both ends of the edge are contained in the nodegroup entries.- consider node attribute "
nodegroup
". - consider edge attribute
"edgegroup"
, which would probably be the optimal approach for edge bundling, apart from implementing another technique similar to force-directed, hierarchically-defined, or density-directed edge bundling. - consider an option to specify the "midpoint" for a nodegroup, to allow some control on the spline curvature. Bonus points for allowing multiple points in order, to influence a path.
-
jam_igraph()
- fancy effect: allow edge colors to have multiple values, then
interpolate color along each edge. For bundled edges, make
a gradient equal to the number of line segments. For straight
edges, break into
detail
number of pieces. Thedetail
argument is also used by theedge_bundle_*()
functions.
- fancy effect: allow edge colors to have multiple values, then
interpolate color along each edge. For bundled edges, make
a gradient equal to the number of line segments. For straight
edges, break into
-
subset
igraph
object by nodes, with added benefit that thegraph_attr(g, "layout")
will also be subset, if present. It is odd that the defaultigraph::subgraph()
function would subset the graph, leaving the layout which inevitably causes an error.
-
mem_gene_pathway_heatmap()
- would be useful to have an option to subset enrichments,
this option would likely be useful to
mem_plot_folio()
, andmem_enrichment_heatmap()
, so the effect could cascade to subsequent functions.
- would be useful to have an option to subset enrichments,
this option would likely be useful to
-
new function idea?
subset_mem()
-
Could be useful to accomplish the item above, to subset by enrichments prior to
mem_plot_folio()
related functions. -
The challenge is that subsetting only
enrichIM
does not also update correspondingmemIM
data. Data to be updated: -
geneIM
,geneIMcolors
,geneIMdirection
- simple subset by colnames -
enrichIM
,enrichIMcolors
,enrichIMgeneCount
- simple subset by colnames -
memIM
- re-create after updatinggeneIM
-
enrichList
- simple subset by name -
Others not necessary for
mem_plot_folio()
:multiEnrichMap
,multiCnetPlot
should be re-created using methods inmultiEnrichMap()
. To be fair, these objects are not that useful anymore, sincemem_plot_folio()
is generally preferred.
-
-
mem_gene_pathway_heatmap()
when supplied with customcolumn_split
throws an error whencluster_columns=TRUE
.- internally it uses
amap::hcluster()
to generate a dendrogram/hclust - using split and dendrogram together is not allowed by
ComplexHeatmap::Heatmap()
- this clustering also uses the incidence matrix combined with the
pathway enrichment annotation displayed along the top, and these values
are weighted with
pathway_column_weight
. - A proper solution would be to provide a custom function for
cluster_columns
that internally combined the incidence matrix and enrichment matrix data together prior to clustering. If this function receives a numeric matrix with proper colnames, it should work.
- internally it uses
-
COMPLETE:
mem_plot_folio()
argumentgene_row_title=NULL
is being passed tomem_gene_pathway_heatmap()
and is therefore not using the defaultrow_title=letters
.
- User reported an error when calling
mem_plot_folio(mem, node_factor=5, label_factor=1.5)
mem_plot_folio()
is passing...
toComplexHeatmap::Heatmap()
which does not accept...
and throws an error when receiving extra arguments.- I need to limit
...
to arguments accepted byHeatmap()
. I feel like there is an R package function to handle this scenario, so I can avoid writingdo.call(Heatmap, custom_args)
.
Workflow that starts with pathway-gene incidence matrix upfront,
bypassing the multiEnrichMap()
workflow altogether.
-
memIM2cnet()
to convert pathway-to-gene incidence matrix to Cnetigraph
.- Requires
geneIM
which is the enrichment-to-gene incidence matrix. In the driving example, each enrichment would represent a disease subgroup, with the full set of differential genes associated to each subgroup. - Requires
enrichIM
which contains enrichment-to-pathway whose values are P-values. When this data is not supplied, it should use 0.001 by default. I think these values are only used as an optional gradient color for the pathway nodes. - Optional
geneIMcolors
which is the same asgeneIM
except the cells contain colors to use for each gene. Currently the function does not fill these colors, the best method is to usecolorjam::matrix2heatColors()
geneIMcolors <- colorjam::matrix2heatColors(x=geneIM, colorV=colorV)
- Requires
-
option to assign pathways to "functional themes" based upon presentation by Adeline Chin in Dr. Hanna Kim' group.
- This step may "collapse" multiple pathways together, similar to grouping functional groups: union of genes, lowest enrichment P-value.
-
heatmap_row_order()
,heatmap_column_order()
should return a flat vector when there are no row_split or column_split, respectively. Currently data is returned as a list of one-length vectors.- Consider moving into
jamba
package, in case this function needs to be re-used, it should not require loading the fullmultienrichjam
package, which itself requires things likeigraph
,clusterProfiler
,qgraph
,DOSE
,matrixStats
. - Import
jamba::heatmap_row_order()
andjamba::heatmap_column_order()
to ensure any functions or packages calling this function will succeed without error.
- Consider moving into
This document describes plans for enhancements to the multienrichjam R package.
Now that directional z-score can also be associated with enrichment
P-values, heatmaps might need to use a bivariate color scale, to
indicate enrichment and directionality. See "stevens.bluered".
For example the mem_enrichment_heatmap()
colors nodes by enrichment
P-value, more intense is more significant enrichment.
The z-score direction is used to apply red "activated" or blue "inhibited".
However, pathways with no z-score, or z-score below the threshold
are colored red by default. They should use a neutral color.
For IPA "Upstream Regulators" it sometimes offers a direction
implied by the activation z-score
. Design idea is to implement
the directionality so it can be included in downstream analyses.
mem_enrichment_heatmap()
- currently shades by the-log10(pvalue)
however if there is directionality, it could be signed+
for activated,-
for inhibited. Then the heatmap color scale would use blue-white-red color gradient.mem$enrichIMdirection
contains matrix of direction, by default1
means all have same direction.
During multiEnrichMap()
it filters for topEnrichN
entries for
each enrichment. It might be useful to retain the rank number for
each enrichment, to review when setting a different topEnrichN
threshold.
mem$enrichIMgeneCount
contains matrix of gene countsmem$enrichIMrank
contains matrix of pathway rank (after filtering gene count)
-
COMPLETE:
mem_enrichment_heatmap()
- the heatmap circles and legend circles are not the same size - they should be fixed to the same absolute size.- COMPLETE: Optionally label each heatmap cell with the number of genes for visual reinforcement.
-
COMPLETE: When there are more than 3 enrichments, the color legend on the gene-pathway heatmap becomes unwieldy - taking over the whole figure.
- Optionally (and by default) hide color legend for the gene-pathway heatmap.
-
COMPLETE: The gene-pathway heatmap use_raster=TRUE causes artifacts in output, it should be disabled by default. In future, debug why things go wrong.
- I think the bug is caused by rasterization being done on underlying numbers
before the color ramp is applied, in which case this problem cannot be
solved when
colorize_by_gene=TRUE
since categorical values are assigned numbers which are not a proper color gradient.
- I think the bug is caused by rasterization being done on underlying numbers
before the color ramp is applied, in which case this problem cannot be
solved when
-
Replace
multiEnrichMap()
withmultienrichjam()
and simplify the arguments:- p_cutoff
- min_gene_count
- top_enrich_n
- colnames: id, name, description, pvalue, gene
- color_sub
-
Port
mem_enrichment_heatmap()
argumentcolorize_by_gene=TRUE
to useComplexHeatmap::Heatmap()
instead ofjamba::imageByColors()
for consistency, also so it can supportstyle="dotplot"
.
jam_igraph()
withrescale=TRUE
should also scale igraph vertex size and igraph label size according to the new axis ranges.- Debug
jam_igraph()
whenvertex.size
is defined alongside V(g)$size, andnode_factor
. It appears not to apply the size properly. - COMPLETE:
mem_enrichment_heatmap()
new option for dot plot format, based uponenrichplot::dotplot()
that sizes each dot by the number of genes present.
- In
multiEnrichMap()
remove defaulttopEnrichSources
andtopEnrichSourceSubset
which throw errors when not using MSigDB data. topEnrichBySource()
andtopEnrichListBySource()
should be able to acceptenrichResult
as input, and returnenrichResult
and notdata.frame
which is understandably lossy. I need to understand the proper method for creating a subset of anenrichList
object, including its internal data.- Streamline the
topEnrichListBySource()
workflow.
- COMPLETE: Allow rotating gene-pathway incidence matrix when using
mem_gene_path_heatmap()
, so the pathway names are rows. - Add test data object
mem
to be convenient for function examples, and testthis test suite.
-
edge_bundling="connections" should also allow
render_groups=TRUE
to work, by returning the"nodegroups"
required for that step.- It should probably filter out singlet nodes by default, since the typical use case is "nodeset-to-node" for Cnet plots.
- Main use case is with a Cnet plot, edge bundling by connected nodes should allow drawing an optional border around each group of nodes.
Priority: high
Status: Implemented, early active testing of usability and functionality
- "connections" - Cnet edge bundles - special case of nodeset-to-node edge bundling
- "nodegroups": Node group bundles - general case of nodeset-to-nodeset edge bundling
Useful to improve readability/aesthetics of collapsed Cnet plots, especially Cnet plots with many gene nodes where it is difficult to tell which pathway clusters are connected to each gene.
-
Consider silently returning the
igraph
object plotted where the vertex, edge, and graph attributes are updated to the values used at runtime. For example, if user overrides any attributes, those attributes will be present in the object returned, so the next iteration someone could just calljam_igraph()
without any custom attributes and it would produce the same plot again. -
See https://igraph.org/r/doc/plot.common.html and
igraph::igraph_options()
Priority: high
Status: Implemented
-
This task involves making edge bundling accessible as a simple option during plotting, to prevent having to run 3 or 4 functions:
jam_igraph()
with hidden edgesbundle_node_edges()
to display edges on top of nodes- (optional)
jam_igraph()
with hidden edges, nodes with solid white color so they fully cover edges. jam_igraph()
with hidden edges, to display nodes on top of bundled edges while allowing nodes to have alpha transparency
-
Note there is a package based on
ggraph
that implements edge bundling - ifjam_ggraph()
is going to be our future, maybe we should implement edge bundling using a similar approach used by that R package.
Priority: high
plot_cnet_heatmaps()
is in development, and arranges expression heatmaps around a central Cnet plot, using genes in each cnet cluster. It is intended to be used with Cnet clusters fromcollapse_mem_clusters()
, by-product ofmem_plot_folio()
.
This figure seems very useful because it integrates expression changes alongside gene-pathway connections.
The multienrichment workflow would likely become:
- Prepare enrichment data
multiEnrichMap()
mem_plot_folio()
plot_cnet_heatmap()
Priority: low
- Analogous to
igraph:::plot.communities()
withmark.groups
- for edge bundling, this step could optionally display boundaries around each node group.
Value is reinforcing node groups with a boundary. In testing, the boundary actually made plots look more complex. And with simple plots, it does not seem necessary since nodes are already well-spaced.
Priority: medium
-
Purpose is to complement the "enrichIM" matrix, with P-values for each set. However "dysregulated pathways" may also be required to have N number of genes, yet this data is not easily available.
-
Rows are pathways, columns are enrichments, values are the number of genes involved in enrichment.
- Consider a matrix whose cells contain delimited gene symbols
Useful to apply filtering at matrix-level for pathways that meet enrichment P-value and gene count thresholds.
Priority: medium
reorderIgraphNodes()
-
when a cluster of nodes is short-wide, the order should be left-to-right
-
when a cluster of nodes is tall-skinny, the order should be top-to-bottom
-
all else should be sorted left-to-right (or user-defined default order)
- it is visually confusing when tall-skinny nodes are sorted left-to-right
Useful to automate the ordering of node colors
Priority: low
- apparently it had never been implemented? User-defined
subset of pathways/sets to include in
multiEnrichMap()
, as an alternative to usingtopEnrichBySource()
.
Users can currently perform this subset step before running
multiEnrichMap()
.
- Currently
multiEnrichMap()
does not filter by number of genes involved in enrichment, it only filters by enrichment P-value. New argumentmin_count
is applied only whentopEnrichN
is used, but nothing else downstream is aware of filtering bymin_count
. The corresponding argument inmem_plot_folio()
ismin_set_ct_each
, which requires a set (pathway) to contain at least this many entries in at least one enrichment result which also meetsp_cutoff
criteria for enrichment P-value.
Priority: low
-
mem_plot_folio()
and subsequent plots, optional argumenthighlight_genes
which would effectively hide all gene labels excepthighlight_genes
-- to help especially crowded plots.- Option 1: User-supplied genes
- Option 2: Genes defined by one or more pathways/sets to highlight
- Implement as wrapper to
jam_igraph()
to hide/display relevant node labels plot_cnet_heatmap()
could label the heatmap rows usinganno_mark()
Useful in Cnet plots with too many genes to label, it allows only a subset of genes to be highlighted. Will become high priority if a manuscript decides to use this technique.
Priority: medium
-
Goal is to run
mem_plot_folio()
once then be able to plot any component separately. Would help keep all plots in sync with the settings used. -
Benefit: Always have ability to see the exact gene-pathway incidence matrix heatmap, and its clustering, which is used for a collapsed Cnet plots.
-
Design idea: List format
-
Named by type of output:
-
enrichment P-values
-
gene-pathway heatmap
- Heatmap object
- gene clusters
- pathway clusters
- filter/cluster settings used
-
collapsed cnet igraphs
- with each labeling option implemented
-
cnet exemplar igraphs
- 1 exemplar per pathway cluster
- 2 exemplars per pathway cluster
- 3 exemplars per pathway cluster
-
each cnet cluster igraph
- cluster 1 cnet igraph
- cluster 2 cnet igraph
- cluster 3 cnet igraph
- ...
-
-
Each type of output contains a
list
of relevant components
-
Priority: low
-
Goal is to use ggraph for ggplot2-type plotting instead of using base R for
igraph
plots. -
jam_ggraph()
?- Test
scatterpie
R package which implements pie node shapes for use in ggraph. Unsure the data content requirement.
- Test
- Examples should show how to specify color order, or change the order if necessary.
reorderIgraphNodes()
when it encounters attributes with multiple colors per node, such as"pie.colors"
and"coloredrect.colors"
, callsavg_colors_by_list()
to generate one blended color per node, then it sorts those colors usingsort_colors()
. However, the color order ultimately does not match the order in the color legend, and other plots such as heatmaps.- Future idea is to convert node attributes with multiple colors per
node into a
data.frame
where each color is in a separate column. Then convert each column to a factor whose levels match the color order fromcolorV
(argument tomultiEnrichMap()
). The end result should sort nodes consistent with the order of colors. - Note this process assumes that all node colors are from a limited
set, which ideally should match
colorV
. Therefore if any color gradient is applied to nodes withnodeType="gene"
, this process will not work.