Releases: mlizhangx/Network-Analysis-for-Repertoire-Sequencing-
NAIR 1.0.4
New Features
- New internal cpp function that computes graph adjacency matrix using pattern-based algorithm, developed by Daniil Matveev (implemented for metrics Hamming, Levenshtein and cutoffs 0, 1, 2). Faster than the default algorithm when network is sufficiently sparse and sequences are not too long, but can incur memory issues with large or densely-connected networks.
generateAdjacencyMatrix()
has new parametermethod
used to specify the algorithm. Accepts value"pattern"
to call the new routine for the pattern-based algorithm.- Speed improvements to the default algorithms for computing graph adjacency matrices
- Speed improvement to the argument check used for function parameters that accept an adjacency matrix
Minor Changes and Bug Fixes
- Saving network output using
output_type = "individual"
now also saves the entire network list as an RData file (.rda
). - Updated tests for compatibility with upcoming changes to guides in
ggplot2
(thanks to Teun van den Brand and theggplot2
development team for contributing the updates)
NAIR 1.0.3
Minor Changes and Bug Fixes
- Removed a package test that checked for particular numbers of clusters resulting from specific applications of clustering algorithms from the
igraph
package. The test no longer passes withigraph
version 1.6.0. Rather than update the test to pass, it has been removed to avoid future occurrences of this issue.
NAIR 1.0.2
Minor Changes and Bug Fixes
- Fixed a bug in
levDistBounded()
that causes undefined behavior when either string is empty after removing the common prefix and suffix. levDistBounded.cpp
andhamDistBounded.cpp
now use thestring.h
header instead ofstrings.h
NAIR 1.0.1
Breaking Changes
getClusterStats()
now requires the cluster ID column to be specified and present in the provided node metadata; it will no longer compute cluster membership since it does not return the node metadata (so any membership values computed are lost).addClusterMembership()
now accepts and returns the list of network objects instead of accepting and returning the node metadata with the igraph as an additional input. The first parameterdata
has been deprecated and moved in position, with the second parameternet
becoming the first parameter and accepting the list of network objects instead of just the igraph. The function still also supports the old usage (for now), as long asnet
anddata
are specified by name (or the updated argument positions are used). See section "Unified Primary Argument Across Functions" for context.- Functions no longer save output to file by default. The user must provide a directory/file path to the appropriate parameter for output to be saved.
- All instances of
"individual"
as a default value foroutput_type
have been changed to"rds"
."rds"
is the preferred default since it reduces file size/clutter and the list of network objects can be restored intact (the list is the primary input/output of coreNAIR
functions) under any name desired."rda"
should be used if the file will be transferred across machines (the list will be restored under the namenet
), and"individual"
should be used when the output is to be accessed from outside of R. output_type = "individual"
now writes the row names of the node metadata to the first column of the csv file. These contain the original row IDs from the input data.- Default value of
output_type
infindAssociatedClones()
andinput_type
inbuildAssociatedClusterNetwork()
changed from"csv"
to"rds"
, since these files are intermediate outputs and typically there should be no need to access them from outside of R or from another machine. buildPublicClusterNetworkByRepresentative()
default value ofoutput_type
changed from"rda"
to"rds"
.
New Features
This section covers general new features. Other new features are grouped by subject in the following few sections.
buildRepSeqNetwork()
now has the convenient aliasbuildNet()
.- The list returned by
buildRepSeqNetwork()
now contains an elementdetails
with network metadata such as the argument values used in the function call. - Plots with nodes colored according to a continuous variable will now have their legends displayed using a color bar instead of discrete legend values, unless that variable is also used to size the nodes.
- In most cases where an invalid value is supplied to a function argument for which a meaningful default exists, instead of raising an error, the argument's value is replaced by the default value and a warning is raised.
Unified Primary Argument Across Functions
Several changes and additions have been made in favor of using the list of network objects returned by buildRepSeqNetwork()
as a unified primary input and output across the core NAIR
functions. Adopting this convention offers several benefits: It greatly simplifies usage, since users no longer need to know which components of the list to input to which function (or what each function returns); it eliminates the task of manually updating the list of network objects; it results in the core functions working with the pipe operator; and most importantly, it improves functionality within and between functions, since functions can read and modify anything in the network list. For instance, addPlots()
can use the coordinate layout of any existing plots to ensure a consistent layout across plots (which is no longer guaranteed otherwise), while addClusterStats()
can add cluster membership values to the node metadata and record in details
that the cluster properties correspond to these membership values (and not the values from a different instance of clustering using a different algorithm).
The following changes encompass the move toward using the network list as a primary input/output:
addClusterMembership()
parameters and return value have changed. See the Breaking Changes section for details.addPlots()
added as the preferred alternative togenerateNetworkGraphPlots()
andplotNetworkGraph()
addClusterStats()
added as the preferred alternative togetClusterStats()
addNodeStats()
added as the preferred alternative toaddNodeNetworkStats()
labelClusters()
added as the preferred alternative toaddClusterLabels()
labelNodes()
added as the preferred alternative toaddGraphLabels()
See the new "Supplementary Functions" vignette for examples.
Multiple Instances of Clustering
The following changes and additions have been made to facilitate multiple instances of clustering on the same network using different clustering algorithms. See the new "Cluster Analysis" vignette for examples.
- All functions that can perform clustering now have a parameter
cluster_id_name
that can be used to specify a custom name for the cluster membership variable added to the node metadata. - Each time a new cluster membership variable is added to the node metadata, information is added to
details
recording the clustering algorithm used and the name of the corresponding cluster membership variable. - When cluster properties are computed with
addClusterStats()
, information is added todetails
recording the cluster membership variable corresponding to the cluster properties. labelClusters()
andaddClusterLabels()
now checkdetails
to confirm that the cluster properties match the specified cluster membership variable before using the node counts in the cluster properties.labelClusters()
andaddClusterLabels()
can now be used without cluster properties; node count is computed from the cluster membership values.labelClusters()
can be used to label multiple plots at once.addClusterMembership()
,addClusterStats()
andaddNodeStats()
now allow custom argument values for optional parameters of the clustering algorithm through the ellipses (...
) argument.
It may also be of interest in the future to add functionality allowing the network list to contain multiple sets of cluster properties corresponding to different instances of clustering.
Plots and Graph Layout
Plotting functions no longer fix the random seed when generating the coordinate layout for a plot. In order to facilitate a consistent layout across multiple plots of the same network graph, the following changes have been made.
- Multiple plots produced in the same call to
buildRepSeqNetwork()
,addPlots()
andgenerateNetworkGraphPlots()
will all use a common layout. - Plot lists created by
buildRepSeqNetwork()
,addPlots()
andgenerateNetworkGraphPlots()
now include a matrixgraph_layout
containing the layout used in the plots. addPlots()
will automatically use thegraph_layout
mentioned above to ensure that new plots use the same layout as existing plots.- If the network list already contains plots but
graph_layout
is absent,addPlots()
will extract the layout from the first plot and use it for the new plots. generateNetworkGraphPlots()
has a new parameterlayout
that can be used to specify the layout. Can be used to generate new plots with the same layout as existing plots (thoughaddPlots()
is easier). Can also be used to generate plots with custom layout types other than the default layout created usingigraph::layout_components()
.saveNetworkPlots()
has a new parameteroutfile_layout
that can be used to save the graph layout.saveNetwork()
automatically saves the graph layout whenoutput_type = "individual"
.
Essentially, generating new plots with addPlots()
will ensure a consistent layout with the initial plots. Fixing a random seed before calling buildRepSeqNetwork()
(or before the first call to addPlots()
, if buildRepSeqNetwork()
is called with plots = FALSE
) allows the same layout to be reproduced across multiple executions of the same code in which the initial plots are generated.
Improved File Input Functionality
- Most instances of the
file_list
argument now accept a list containing connections and file paths instead of only a character vector of file paths. This allows a greater variety of data sources to be used. - A greater variety of input data formats are now supported. Instances of the
input_type
parameter that accept text formats have a new parameterread.args
that accepts a named list of optional arguments toread.table()
and its variantsread.csv()
, etc. Dedicated arguments forheader
andsep
still exist apart fromread.args
for backwards compatibility, but their defaults now matchinput_type
(e.g.,sep
defaults to","
forinput_type = "csv"
and to""
forinput_type = "table"
). input_type = "tsv"
now reads files usingread.delim()
instead ofread.table()
.- Most instances of the
input_type
argument now also support the value"csv2"
for reading files usingread.csv2()
.
Lifecycle Changes
plotNetworkGraph()
deprecated in favor ofaddPlots()
.filterInputData()
argumentcount_col
deprecated. Rows with NA counts are no longer dropped.getClusterFun()
argumentcluster_fun
deprecated (see Breaking Changes)addNodeNetworkStats()
deprecated in favor ofaddNodeStats()
(see section "Unified Primary Argument Across Functions")addClusterMembership()
argumentdata
deprecated (see section "Unified Primary Argument Across Functions")addClusterMembership()
argumentfun
deprecated in favor ofcluster_fun
for consistency with other functions.sparseAdjacencyMatFromSeqs()
argumentmax_dist
deprecated in favor ofdist_cutoff
for consistency with other functions.- `sa...