NAIR 1.0.1
Breaking Changes
getClusterStats()
now requires the cluster ID column to be specified and present in the provided node metadata; it will no longer compute cluster membership since it does not return the node metadata (so any membership values computed are lost).addClusterMembership()
now accepts and returns the list of network objects instead of accepting and returning the node metadata with the igraph as an additional input. The first parameterdata
has been deprecated and moved in position, with the second parameternet
becoming the first parameter and accepting the list of network objects instead of just the igraph. The function still also supports the old usage (for now), as long asnet
anddata
are specified by name (or the updated argument positions are used). See section "Unified Primary Argument Across Functions" for context.- Functions no longer save output to file by default. The user must provide a directory/file path to the appropriate parameter for output to be saved.
- All instances of
"individual"
as a default value foroutput_type
have been changed to"rds"
."rds"
is the preferred default since it reduces file size/clutter and the list of network objects can be restored intact (the list is the primary input/output of coreNAIR
functions) under any name desired."rda"
should be used if the file will be transferred across machines (the list will be restored under the namenet
), and"individual"
should be used when the output is to be accessed from outside of R. output_type = "individual"
now writes the row names of the node metadata to the first column of the csv file. These contain the original row IDs from the input data.- Default value of
output_type
infindAssociatedClones()
andinput_type
inbuildAssociatedClusterNetwork()
changed from"csv"
to"rds"
, since these files are intermediate outputs and typically there should be no need to access them from outside of R or from another machine. buildPublicClusterNetworkByRepresentative()
default value ofoutput_type
changed from"rda"
to"rds"
.
New Features
This section covers general new features. Other new features are grouped by subject in the following few sections.
buildRepSeqNetwork()
now has the convenient aliasbuildNet()
.- The list returned by
buildRepSeqNetwork()
now contains an elementdetails
with network metadata such as the argument values used in the function call. - Plots with nodes colored according to a continuous variable will now have their legends displayed using a color bar instead of discrete legend values, unless that variable is also used to size the nodes.
- In most cases where an invalid value is supplied to a function argument for which a meaningful default exists, instead of raising an error, the argument's value is replaced by the default value and a warning is raised.
Unified Primary Argument Across Functions
Several changes and additions have been made in favor of using the list of network objects returned by buildRepSeqNetwork()
as a unified primary input and output across the core NAIR
functions. Adopting this convention offers several benefits: It greatly simplifies usage, since users no longer need to know which components of the list to input to which function (or what each function returns); it eliminates the task of manually updating the list of network objects; it results in the core functions working with the pipe operator; and most importantly, it improves functionality within and between functions, since functions can read and modify anything in the network list. For instance, addPlots()
can use the coordinate layout of any existing plots to ensure a consistent layout across plots (which is no longer guaranteed otherwise), while addClusterStats()
can add cluster membership values to the node metadata and record in details
that the cluster properties correspond to these membership values (and not the values from a different instance of clustering using a different algorithm).
The following changes encompass the move toward using the network list as a primary input/output:
addClusterMembership()
parameters and return value have changed. See the Breaking Changes section for details.addPlots()
added as the preferred alternative togenerateNetworkGraphPlots()
andplotNetworkGraph()
addClusterStats()
added as the preferred alternative togetClusterStats()
addNodeStats()
added as the preferred alternative toaddNodeNetworkStats()
labelClusters()
added as the preferred alternative toaddClusterLabels()
labelNodes()
added as the preferred alternative toaddGraphLabels()
See the new "Supplementary Functions" vignette for examples.
Multiple Instances of Clustering
The following changes and additions have been made to facilitate multiple instances of clustering on the same network using different clustering algorithms. See the new "Cluster Analysis" vignette for examples.
- All functions that can perform clustering now have a parameter
cluster_id_name
that can be used to specify a custom name for the cluster membership variable added to the node metadata. - Each time a new cluster membership variable is added to the node metadata, information is added to
details
recording the clustering algorithm used and the name of the corresponding cluster membership variable. - When cluster properties are computed with
addClusterStats()
, information is added todetails
recording the cluster membership variable corresponding to the cluster properties. labelClusters()
andaddClusterLabels()
now checkdetails
to confirm that the cluster properties match the specified cluster membership variable before using the node counts in the cluster properties.labelClusters()
andaddClusterLabels()
can now be used without cluster properties; node count is computed from the cluster membership values.labelClusters()
can be used to label multiple plots at once.addClusterMembership()
,addClusterStats()
andaddNodeStats()
now allow custom argument values for optional parameters of the clustering algorithm through the ellipses (...
) argument.
It may also be of interest in the future to add functionality allowing the network list to contain multiple sets of cluster properties corresponding to different instances of clustering.
Plots and Graph Layout
Plotting functions no longer fix the random seed when generating the coordinate layout for a plot. In order to facilitate a consistent layout across multiple plots of the same network graph, the following changes have been made.
- Multiple plots produced in the same call to
buildRepSeqNetwork()
,addPlots()
andgenerateNetworkGraphPlots()
will all use a common layout. - Plot lists created by
buildRepSeqNetwork()
,addPlots()
andgenerateNetworkGraphPlots()
now include a matrixgraph_layout
containing the layout used in the plots. addPlots()
will automatically use thegraph_layout
mentioned above to ensure that new plots use the same layout as existing plots.- If the network list already contains plots but
graph_layout
is absent,addPlots()
will extract the layout from the first plot and use it for the new plots. generateNetworkGraphPlots()
has a new parameterlayout
that can be used to specify the layout. Can be used to generate new plots with the same layout as existing plots (thoughaddPlots()
is easier). Can also be used to generate plots with custom layout types other than the default layout created usingigraph::layout_components()
.saveNetworkPlots()
has a new parameteroutfile_layout
that can be used to save the graph layout.saveNetwork()
automatically saves the graph layout whenoutput_type = "individual"
.
Essentially, generating new plots with addPlots()
will ensure a consistent layout with the initial plots. Fixing a random seed before calling buildRepSeqNetwork()
(or before the first call to addPlots()
, if buildRepSeqNetwork()
is called with plots = FALSE
) allows the same layout to be reproduced across multiple executions of the same code in which the initial plots are generated.
Improved File Input Functionality
- Most instances of the
file_list
argument now accept a list containing connections and file paths instead of only a character vector of file paths. This allows a greater variety of data sources to be used. - A greater variety of input data formats are now supported. Instances of the
input_type
parameter that accept text formats have a new parameterread.args
that accepts a named list of optional arguments toread.table()
and its variantsread.csv()
, etc. Dedicated arguments forheader
andsep
still exist apart fromread.args
for backwards compatibility, but their defaults now matchinput_type
(e.g.,sep
defaults to","
forinput_type = "csv"
and to""
forinput_type = "table"
). input_type = "tsv"
now reads files usingread.delim()
instead ofread.table()
.- Most instances of the
input_type
argument now also support the value"csv2"
for reading files usingread.csv2()
.
Lifecycle Changes
plotNetworkGraph()
deprecated in favor ofaddPlots()
.filterInputData()
argumentcount_col
deprecated. Rows with NA counts are no longer dropped.getClusterFun()
argumentcluster_fun
deprecated (see Breaking Changes)addNodeNetworkStats()
deprecated in favor ofaddNodeStats()
(see section "Unified Primary Argument Across Functions")addClusterMembership()
argumentdata
deprecated (see section "Unified Primary Argument Across Functions")addClusterMembership()
argumentfun
deprecated in favor ofcluster_fun
for consistency with other functions.sparseAdjacencyMatFromSeqs()
argumentmax_dist
deprecated in favor ofdist_cutoff
for consistency with other functions.saveNetwork()
argumentoutput_filename
deprecated in favor ofoutput_name
for consistency with other functions.sparseAdjacencyMatFromSeqs()
deprecated in favor of its better-named twingenerateAdjacencyMatrix()
.generateNetworkFromAdjacencyMat()
deprecated in favor of its better-named twingenerateNetworkGraph()
.
Minor Changes and Bug Fixes
output_type = "individual"
now also saves the list of plots (if present) to an RDS file. This prevents theggraph
objects containing the plots from being lost, in case the user wishes to modify these plots in the future.- All instances of the
output_name
parameter now automatically replace potentially unsafe characters with underscores and removes any leading or trailing non-alphanumeric characters. Safe characters include alphanumeric characters, underscores and hyphens. - Package functions no longer print messages to the console by default. Functions now have a
verbose
argument which can be set toTRUE
to enable printing of console messages. For logging purposes, these messages are now generated usingmessage()
rather thancat()
, and so send their output tostd.err()
rather thanstd.out()
. buildRepSeqNetwork()
,addPlots()
andgenerateNetworkGraphPlots()
now haveprint_plots
set toFALSE
by default (plots are no longer printed to the R plotting window unless manually specified).buildAssociatedClusterNetwork()
now removes duplicate observations after loading the data from all neighborhoods. When multiple associated sequences are similar, the same clone from a given sample can belong to multiple neighborhoods. Previously, this occurrence resulted in the same clone appearing multiple times in the global network.simulateToyData()
argumentseed_value
removed. Users can set a seed prior to calling the function if desired.generateNetworkGraphPlots()
now handles the case wherecolor_nodes_by
contains duplicate values by removing the duplicate values with a warning. Ifcolor_scheme
is a vector, the corresponding entries ofcolor_scheme
are also removed. Previously, this case resulted in a list of plots containing two elements with the same name.- When
generateNetworkGraphPlots()
is called with a non-numeric variable specified forsize_nodes_by
, the function now defaults to fixed node sizes with a warning. addClusterStats()
andbuildRepSeqNetwork(cluster_stats = TRUE)
now callsum()
andmax()
withna.rm = TRUE
when computing abundance-based properties. This change reflects the fact thatbuildRepSeqNetwork()
no longer drops input data rows withNA
andNaN
values in the count column.combineSamples()
andloadDataFromFileList()
now preserve the original row IDs of each input file, which are prepended in the combined data by sample IDs (if available) or the file number based on the order infile_list
.