Version 1.0.0 - "Ardent adenine"
This release represents an essentially full rewrite of umi-transfer
by Johannes Alneberg (@alneberg) and me (@MatthiasZepper):
New and Improved Features:
- Code organization: The code base has been split into separate files, with each file representing a subcommand and its associated CLI configuration. This improves clarity and allows for easy integration of additional subcommands and functionalities in the future.
- Enhanced CLI options: The CLI arguments have been revamped for improved usability. Previously, specifying the output directly was not possible, hindering the creation of a nf-core Nextflow module. Specifying an output is still optional, but now the output file names are derived from the input file names rather than from a constant base provided as CLI argument. Furthermore, the delimiter used to join the UMIs can be customized now. The
--edit_nr
flag has been renamed to--correct_numbers
and applies to both files for better consistency. - Improved output file handling: The output file name will automatically include a .gz suffix if the
-z
/--compress
flag is enabled. Conversely, an eventual suffix will be removed if no compression was requested. Additionally, the tool verifies that the output file does not exist yet and prompts for overwrite confirmation (unless-f
/--force
is specified). - Enhanced error handling: Functions have been rewritten to utilize Results and Options, enabling proper error handling. Before, many functions simply panicked and the program crashed, for example if a non-existing input file was specified.
- UMI ID validation: The tool now compares the ID of the UMI to that of the read, ensuring that the tool terminates upon encountering a mismatch. This prevents incorrect UMIs from being added to the read IDs due to differently sorted files.
- Automated tests: Several unit tests and extensive integration tests have been implemented to enhance the reliability of the tool.
- Continuous integration pipelines: The CI pipelines have been refactored, and a new release pipeline builds the tool for seven common architectures.
Discontinued Previous Features:
- Support for inline UMIs: The previous inline functionality for transferring fixed-length UMIs was limited and did not support offsets or regular expressions. Since there are existing tools like
umitools
that already serve this purpose, we decided to prioritize the development of novel functionality. However, the new subcommand structure in the code paves the way for future support of inline UMIs. - Progress bar: The progress bar provided a helpful visual aid, but it required counting one of the files to determine the total number of reads, resulting in the need to read the file twice. Considering performance reasons, we made the decision to remove this feature, especially since most runs are expected to be non-interactive in workflow systems like Nextflow.
- Multi-threading: In the previous version (0.1) of
umi-transfer
, it was possible to run the tool on two cores when processing paired FastQ files, with each file assigned to a separate thread. However, the tool's performance was primarily limited by output compression, and multi-threading caused significant overhead. A future version ofumi-transfer
will be designed to run fully asynchronous and efficiently scale over multiple threads. In the meantime, we recommend utilizing FIFOs and external compression with tools likepigz
. - Support for singletons: To simplify the code structure, we made the second FastQ file mandatory. For running on singletons, you can provide the same input twice and redirect one of the output files to
/dev/null
using a FIFO.