Skip to content

Releases: mlizhangx/Network-Analysis-for-Repertoire-Sequencing-

NAIR 1.0.4

05 Apr 18:05
Compare
Choose a tag to compare

New Features

  • New internal cpp function that computes graph adjacency matrix using pattern-based algorithm, developed by Daniil Matveev (implemented for metrics Hamming, Levenshtein and cutoffs 0, 1, 2). Faster than the default algorithm when network is sufficiently sparse and sequences are not too long, but can incur memory issues with large or densely-connected networks.
  • generateAdjacencyMatrix() has new parameter method used to specify the algorithm. Accepts value "pattern" to call the new routine for the pattern-based algorithm.
  • Speed improvements to the default algorithms for computing graph adjacency matrices
  • Speed improvement to the argument check used for function parameters that accept an adjacency matrix

Minor Changes and Bug Fixes

  • Saving network output using output_type = "individual" now also saves the entire network list as an RData file (.rda).
  • Updated tests for compatibility with upcoming changes to guides in ggplot2 (thanks to Teun van den Brand and the ggplot2 development team for contributing the updates)

NAIR 1.0.3

13 Jan 18:41
Compare
Choose a tag to compare

Minor Changes and Bug Fixes

  • Removed a package test that checked for particular numbers of clusters resulting from specific applications of clustering algorithms from the igraph package. The test no longer passes with igraph version 1.6.0. Rather than update the test to pass, it has been removed to avoid future occurrences of this issue.

NAIR 1.0.2

28 Sep 06:49
Compare
Choose a tag to compare

Minor Changes and Bug Fixes

  • Fixed a bug in levDistBounded() that causes undefined behavior when either string is empty after removing the common prefix and suffix.
  • levDistBounded.cpp and hamDistBounded.cpp now use the string.h header instead of strings.h

NAIR 1.0.1

14 Sep 19:44
Compare
Choose a tag to compare

Breaking Changes

  • getClusterStats() now requires the cluster ID column to be specified and present in the provided node metadata; it will no longer compute cluster membership since it does not return the node metadata (so any membership values computed are lost).
  • addClusterMembership() now accepts and returns the list of network objects instead of accepting and returning the node metadata with the igraph as an additional input. The first parameter data has been deprecated and moved in position, with the second parameter net becoming the first parameter and accepting the list of network objects instead of just the igraph. The function still also supports the old usage (for now), as long as net and data are specified by name (or the updated argument positions are used). See section "Unified Primary Argument Across Functions" for context.
  • Functions no longer save output to file by default. The user must provide a directory/file path to the appropriate parameter for output to be saved.
  • All instances of "individual" as a default value for output_type have been changed to "rds". "rds" is the preferred default since it reduces file size/clutter and the list of network objects can be restored intact (the list is the primary input/output of core NAIR functions) under any name desired. "rda" should be used if the file will be transferred across machines (the list will be restored under the name net), and "individual" should be used when the output is to be accessed from outside of R.
  • output_type = "individual" now writes the row names of the node metadata to the first column of the csv file. These contain the original row IDs from the input data.
  • Default value of output_type in findAssociatedClones() and input_type in buildAssociatedClusterNetwork() changed from "csv" to "rds", since these files are intermediate outputs and typically there should be no need to access them from outside of R or from another machine.
  • buildPublicClusterNetworkByRepresentative() default value of output_type changed from "rda" to "rds".

New Features

This section covers general new features. Other new features are grouped by subject in the following few sections.

  • buildRepSeqNetwork() now has the convenient alias buildNet().
  • The list returned by buildRepSeqNetwork() now contains an element details with network metadata such as the argument values used in the function call.
  • Plots with nodes colored according to a continuous variable will now have their legends displayed using a color bar instead of discrete legend values, unless that variable is also used to size the nodes.
  • In most cases where an invalid value is supplied to a function argument for which a meaningful default exists, instead of raising an error, the argument's value is replaced by the default value and a warning is raised.

Unified Primary Argument Across Functions

Several changes and additions have been made in favor of using the list of network objects returned by buildRepSeqNetwork() as a unified primary input and output across the core NAIR functions. Adopting this convention offers several benefits: It greatly simplifies usage, since users no longer need to know which components of the list to input to which function (or what each function returns); it eliminates the task of manually updating the list of network objects; it results in the core functions working with the pipe operator; and most importantly, it improves functionality within and between functions, since functions can read and modify anything in the network list. For instance, addPlots() can use the coordinate layout of any existing plots to ensure a consistent layout across plots (which is no longer guaranteed otherwise), while addClusterStats() can add cluster membership values to the node metadata and record in details that the cluster properties correspond to these membership values (and not the values from a different instance of clustering using a different algorithm).

The following changes encompass the move toward using the network list as a primary input/output:

  • addClusterMembership() parameters and return value have changed. See the Breaking Changes section for details.
  • addPlots() added as the preferred alternative to generateNetworkGraphPlots() and plotNetworkGraph()
  • addClusterStats() added as the preferred alternative to getClusterStats()
  • addNodeStats() added as the preferred alternative to addNodeNetworkStats()
  • labelClusters() added as the preferred alternative to addClusterLabels()
  • labelNodes() added as the preferred alternative to addGraphLabels()

See the new "Supplementary Functions" vignette for examples.

Multiple Instances of Clustering

The following changes and additions have been made to facilitate multiple instances of clustering on the same network using different clustering algorithms. See the new "Cluster Analysis" vignette for examples.

  • All functions that can perform clustering now have a parameter cluster_id_name that can be used to specify a custom name for the cluster membership variable added to the node metadata.
  • Each time a new cluster membership variable is added to the node metadata, information is added to details recording the clustering algorithm used and the name of the corresponding cluster membership variable.
  • When cluster properties are computed with addClusterStats(), information is added to details recording the cluster membership variable corresponding to the cluster properties.
  • labelClusters() and addClusterLabels() now check details to confirm that the cluster properties match the specified cluster membership variable before using the node counts in the cluster properties.
  • labelClusters() and addClusterLabels() can now be used without cluster properties; node count is computed from the cluster membership values.
  • labelClusters() can be used to label multiple plots at once.
  • addClusterMembership(), addClusterStats() and addNodeStats() now allow custom argument values for optional parameters of the clustering algorithm through the ellipses (...) argument.

It may also be of interest in the future to add functionality allowing the network list to contain multiple sets of cluster properties corresponding to different instances of clustering.

Plots and Graph Layout

Plotting functions no longer fix the random seed when generating the coordinate layout for a plot. In order to facilitate a consistent layout across multiple plots of the same network graph, the following changes have been made.

  • Multiple plots produced in the same call to buildRepSeqNetwork(), addPlots() and generateNetworkGraphPlots() will all use a common layout.
  • Plot lists created by buildRepSeqNetwork(), addPlots() and generateNetworkGraphPlots() now include a matrix graph_layout containing the layout used in the plots.
  • addPlots() will automatically use the graph_layout mentioned above to ensure that new plots use the same layout as existing plots.
  • If the network list already contains plots but graph_layout is absent, addPlots() will extract the layout from the first plot and use it for the new plots.
  • generateNetworkGraphPlots() has a new parameter layout that can be used to specify the layout. Can be used to generate new plots with the same layout as existing plots (though addPlots() is easier). Can also be used to generate plots with custom layout types other than the default layout created using igraph::layout_components().
  • saveNetworkPlots() has a new parameter outfile_layout that can be used to save the graph layout.
  • saveNetwork() automatically saves the graph layout when output_type = "individual".

Essentially, generating new plots with addPlots() will ensure a consistent layout with the initial plots. Fixing a random seed before calling buildRepSeqNetwork() (or before the first call to addPlots(), if buildRepSeqNetwork() is called with plots = FALSE) allows the same layout to be reproduced across multiple executions of the same code in which the initial plots are generated.

Improved File Input Functionality

  • Most instances of the file_list argument now accept a list containing connections and file paths instead of only a character vector of file paths. This allows a greater variety of data sources to be used.
  • A greater variety of input data formats are now supported. Instances of the input_type parameter that accept text formats have a new parameter read.args that accepts a named list of optional arguments to read.table() and its variants read.csv(), etc. Dedicated arguments for header and sep still exist apart from read.args for backwards compatibility, but their defaults now match input_type (e.g., sep defaults to "," for input_type = "csv" and to "" for input_type = "table").
  • input_type = "tsv" now reads files using read.delim() instead of read.table().
  • Most instances of the input_type argument now also support the value "csv2" for reading files using read.csv2().

Lifecycle Changes

  • plotNetworkGraph() deprecated in favor of addPlots().
  • filterInputData() argument count_col deprecated. Rows with NA counts are no longer dropped.
  • getClusterFun() argument cluster_fun deprecated (see Breaking Changes)
  • addNodeNetworkStats() deprecated in favor of addNodeStats() (see section "Unified Primary Argument Across Functions")
  • addClusterMembership() argument data deprecated (see section "Unified Primary Argument Across Functions")
  • addClusterMembership() argument fun deprecated in favor of cluster_fun for consistency with other functions.
  • sparseAdjacencyMatFromSeqs() argument max_dist deprecated in favor of dist_cutoff for consistency with other functions.
  • `sa...
Read more