Release Version 1.0.0 - "Ardent adenine" · SciLifeLab/umi-transfer

This release represents an essentially full rewrite of umi-transfer by Johannes Alneberg (@alneberg) and me (@MatthiasZepper):

New and Improved Features:

Code organization: The code base has been split into separate files, with each file representing a subcommand and its associated CLI configuration. This improves clarity and allows for easy integration of additional subcommands and functionalities in the future.
Enhanced CLI options: The CLI arguments have been revamped for improved usability. Previously, specifying the output directly was not possible, hindering the creation of a nf-core Nextflow module. Specifying an output is still optional, but now the output file names are derived from the input file names rather than from a constant base provided as CLI argument. Furthermore, the delimiter used to join the UMIs can be customized now. The --edit_nr flag has been renamed to --correct_numbers and applies to both files for better consistency.
Improved output file handling: The output file name will automatically include a .gz suffix if the -z/--compress flag is enabled. Conversely, an eventual suffix will be removed if no compression was requested. Additionally, the tool verifies that the output file does not exist yet and prompts for overwrite confirmation (unless -f/--force is specified).
Enhanced error handling: Functions have been rewritten to utilize Results and Options, enabling proper error handling. Before, many functions simply panicked and the program crashed, for example if a non-existing input file was specified.
UMI ID validation: The tool now compares the ID of the UMI to that of the read, ensuring that the tool terminates upon encountering a mismatch. This prevents incorrect UMIs from being added to the read IDs due to differently sorted files.
Automated tests: Several unit tests and extensive integration tests have been implemented to enhance the reliability of the tool.
Continuous integration pipelines: The CI pipelines have been refactored, and a new release pipeline builds the tool for seven common architectures.

Discontinued Previous Features:

Support for inline UMIs: The previous inline functionality for transferring fixed-length UMIs was limited and did not support offsets or regular expressions. Since there are existing tools like umitools that already serve this purpose, we decided to prioritize the development of novel functionality. However, the new subcommand structure in the code paves the way for future support of inline UMIs.
Progress bar: The progress bar provided a helpful visual aid, but it required counting one of the files to determine the total number of reads, resulting in the need to read the file twice. Considering performance reasons, we made the decision to remove this feature, especially since most runs are expected to be non-interactive in workflow systems like Nextflow.
Multi-threading: In the previous version (0.1) of umi-transfer, it was possible to run the tool on two cores when processing paired FastQ files, with each file assigned to a separate thread. However, the tool's performance was primarily limited by output compression, and multi-threading caused significant overhead. A future version of umi-transfer will be designed to run fully asynchronous and efficiently scale over multiple threads. In the meantime, we recommend utilizing FIFOs and external compression with tools like pigz.
Support for singletons: To simplify the code structure, we made the second FastQ file mandatory. For running on singletons, you can provide the same input twice and redirect one of the output files to /dev/null using a FIFO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1.0.0 - "Ardent adenine"

New and Improved Features:

Discontinued Previous Features:

Contributors