Skip to content

Releases: pachterlab/kb_python

v0.26.2

04 Jun 18:51
Compare
Choose a tag to compare

General

  • Deprecated --lamanno and --nucleus flags. Use --workflow instead.
  • Updated setup.py so that tests don't get installed.
  • Fixed an issue where requirements.txt would not be included in the Pypi upload.

[YANKED] v0.26.1

02 Jun 01:37
Compare
Choose a tag to compare

This version has been yanked due to an issue with installation. Do not try to install this version!

General

  • Added a check for whether the temporary directory exists. If it does, now prints out an error and exits. (#119)
  • Logging is now handled by a specialized logger implemented in the ngs-tools library, which provides logger namespacing.
  • Updated supported technologies text and syntax for kb --list so that they are more compact. Added link to the kallisto manual for custom technology definitions.
  • Updated citation in info.

ref

  • Fixed --tmp option to set the temporary directory properly (#122)
  • Major refactor of FASTA and GTF parsing. All relevant functions were replaced with appropriate ones from the ngs-tools library. The ones provided in this library are far more robust in dealing with GTF entries (especially missing attributes). FASTA and GTF files no longer have to be sorted nor decompressed. These all result in an approximately order-of-magnitude speedup in splitting the genomic FASTA. Additionally, more helpful error messages are printed, which should help user debuggability.
  • Fixed an issue where no logging messages were displayed when downloading a reference with -d.

count

  • Whitelists are now provided by the ngs-tools library.

v0.26.0

12 Apr 19:38
Compare
Choose a tag to compare

General

  • Added the optional arguments --kallisto and --bustools, which may be used to override the packaged kallisto and bustools binaries. The argument may be a command in the user's PATH, which will be expanded to the full absolute path, or an absolute/relative path to the binary (#109, thanks @apeltzer, @dpryan79, @Maarten-vd-Sande).

ref

  • Any spaces in GTF groups are now removed. For instance, if a transcript has ID TRANSCRIPT ID then the resulting transcript sequence will be named TRANSCRIPTID. (#97, thanks @axelalmet)

count

  • Fixed an issue where converting the count matrix using --loom and --workflow lamanno would cause an error (#91)
  • Fixed an issue with parsing FASTQ paths when using -x smartseq, where the second read file would be erroneously used as the first (#114, thanks @jma1991)
  • Added entries to indicate the current working directory when the kb command was called, along with the kallisto and bustools binary paths and versions in kb_info.json.

v0.25.1

08 Jan 20:47
Compare
Choose a tag to compare

count

  • Fixed loompy does not accept empty matrices as data error when providing --loom with --workflow lamanno (#91)
  • When using --h5ad or --loom with -x smartseq, the output matrix has genes as columns, instead of transcripts. For genes that have multiple transcripts, the counts are added. (#93)
  • For -x smartseq, it is now possible to provide a batch TSV instead of FASTQs directly. The batch TSV must contain exactly three columns: cell ID, FASTQ 1 (read 1), FASTQ 2 (read 2).
  • Added an error when an uneven number of FASTQs are provided for -x smartseq (only paired-end reads are currently supported)
  • Turned off all logging and warning messages from h5py and anndata.

v0.25.0

19 Nov 14:31
Compare
Choose a tag to compare

ref

  • Progress bar is now displayed when downloading pre-packaged reference files.
  • Added checks to provide more useful outputs for common errors, including: 1) when FASTA and GTF chromosomes do not match, 2) when a GTF entry is not parsable, and 3) when either transcript or exon entry for a transcript is missing in the GTF (both are required).
  • Added -k option to override default (or calculated optimal) kmer length for the Kallisto index.
  • Added functionality to generate a feature barcode reference for use with the KITE feature-barcoding workflow. To use this option, supply --workflow kite and a feature-barcode to cell-barcode mapping.
  • Added -n option to be able to split indices into n parts. This reduces the maximum memory used at any given time. Useful for running in memory-limited environments. When the -n option is used, the -i argument is used as the prefix to the n indices generated. Each of these indices are appended with a .i where i is the index number, starting from i=0. When -n is used the built indices must be passed in as a comma-delimited list to kb count (NOTE: this feature is EXPERIMENTAL See count for more details). When -n is used with --workflow lamanno or --workflow nucleus, only the intron FASTA is split into n-1 parts, which are then each indexed separately. The cDNA FASTA is indexed in its entirety and is never split.
  • Added functionality to build a single index using multiple references. Useful for mixed species experiments. The fasta argument should be a comma-delimited list of genome FASTAs, and the gtf argument should be a comma-delimited list of GTFs, corresponding in position to each genome FASTA.
  • Added --tmp option to manually specify temporary directory. Otherwise, behavior is identical to previous version (tmp directory at the location kb is executed).
  • Added support for IUPAC nucleotide code. Note that kallisto replaces non-ACGUT nucleotides to pseudorandom ones. Thanks @Maarten-vd-Sande

count

  • Added support for KITE feature-barcoding workflow. The bustools binary was updated to support this feature.
  • DEPRECATION: The --lamanno and --nucleus flags will be deprecated in the next release. These have been replaced with --workflow lamanno and --workflow nucleus.
  • All BUS files that are input/outputs are validated before/after running kallisto or bustools. A BUS file is considered valid if it is read with bustools without error and it has positive number of BUS records. This should prevent bustools from trying to sort empty BUS files and crashing (#31).
  • Added functionality to generate TCC matrices with the --tcc flag.
  • Added --tcc flag to include reads that pseudoalign to multiple genes.
  • When running in verbose mode (--verbose), commands are no longer printed with the full path to the bustools and kallisto binaries. These paths are printed once at the start of the program.
  • Added --dry-run flag, which prints the entire workflow to standard output as shell commands, without actually running them.
  • EXPERIMENTAL: Added support for multiple indices by passing a comma-delimited list of indices to -i. kb will align the reads to each of these indices and merge the BUS files with bustools mash and bustools merge. This feature is currently EXPERIMENTAL, and there are known issues that cause the loss of reads. This feature will be fully supported in a future release. In the meantime, use at your own risk!
  • Added --tmp option to manually specify temporary directory. The default behavior has also changed: the default tmp directory is created IN THE OUTPUT FOLDER (specified by -o). Previously, the tmp directory was created where kb was run, which was causing issues when running multiple instances of kb from the same location. Thanks to @Munfred and @kokitsuyuzaki for the suggestion.
  • kb now outputs a kb_info.json which includes useful run information, such as the commands run and their runtimes.
  • Added functionality to generate a brief standalone HTML report that includes basic statistics (run_info.json, inspect.json) and quality-control plots (knee plot, elbow plot, pca, genes detected). This feature is available with the --report flag. Using this flag on velocity matrices may cause kb to crash due to high memory usage, and a corresponding warning is printed at the start. Plots for TCC matrices are not supported.
  • When the matrix is converted to H5AD or Loom format (using the --h5ad or --loom options), the gene/feature names are included as a column in the var of the anndata. Related to #52
  • Added a --cellranger option, which converts the raw gene matrices to cellranger-compatible format in a separate, cellranger directory for standard workflow (and cellranger_spliced and cellranger_unspliced for velocity and nucleus workflows). Note that cellranger outputs matrices with genes as rows and cells (barcodes) as columns.
  • Added --mm flag to include bus records that pseudoalign to multiple genes, via the --multimapping flag in bustools count (#57).
  • None can be provided as the whitelist, which will force kb to use the bustools whitelist command, even if there exists a pre-packaged whitelist.
  • Added support for Smart-seq reads with -x smartseq. FASTQs are paired by first sorting the list of FASTQ paths in lexicographical order, and taking every two to be a pair. For instance, if 1.fastq 3.fastq 2.fastq 4.fastq is provided, 1.fastq and 2.fastq will be a pair, and 3.fastq and 4.fastq will be another pair. The FASTQ argument now supports glob expressions to make it easier to provide a long list of FASTQs.

v0.24.4

07 Nov 01:24
Compare
Choose a tag to compare

--info

  • Fix typo with indropsv3

ref

  • If any input (FASTA or GTF) files are provided as gzip files, they are uncompressed to the temporary directory, instead of being streamed directly. This is because ref relies on being able to access arbitrary locations of the files quickly. Working with decompressed files results in a considerable speedup.

count

  • For --lamanno: spliced and unspliced busfiles no longer contain the .s suffix. This was done to make the output consistent with the normal (non--lamanno) command
  • Implemented --filter with --lamanno
  • Support for single nuclei RNA-seq with --nuclei. The only difference between --nuclei and --lamanno is how the spliced and unspliced matrices are combined. Specifically, --nuclei sums the matrices. Using --nuclei with neither --loom nor --h5ad results in behavior identical with --lamanno.

v0.24.3

05 Nov 00:03
Compare
Choose a tag to compare

kallisto

  • Update to 0.46.1.

--info

  • Updated information on indrop versions

v0.24.2

01 Nov 04:26
Compare
Choose a tag to compare

count

  • fix bug with --filter where it would produce the same matrix as unfiltered

v0.24.1

01 Nov 03:25
Compare
Choose a tag to compare

ref

  • kb now provides a pre-built human index for RNA velocity (linnarsson)
  • The intronic fasta with the --lamanno option now includes 30-base flanking regions.

count

  • Unfiltered count matrices will always be placed in the counts_unfiltered folder.
  • If the --filter option is specified, the filtered count matrices will be placed in the counts_filtered folder.

v0.24.0

30 Oct 23:07
Compare
Choose a tag to compare

Official release.