Releases: nextgenusfs/amptk
Releases · nextgenusfs/amptk
amptk v0.9.1
- Add phix filtering for Illumina data. As part of the PE merging function in
amptk illumina
andamptk illumina2
, scripts will also now run phix removal. - Workaround for DADA2 error where samples that only have 1 read post filtering trigger a
derep$quals matrix
error.amptk dada2
now has-m, --min_reads
option to drop samples that have fewer than-m
reads. Default this is set to 10, however, in practice probably this should be much higher, but this should avoid the above error.
amptk v0.9.0
- added better support for
amptk SRA-submit
- added ability to normalize heat map
- added
amptk SRA
which can be used to process reads downloaded from the SRA, where they are in a single FASTQ file, i.e. from ION or 454 data that has been demultiplexed into samples and then submitted. - created Dockerfile for using
amptk
with thescipy-notebook
jupyter notebook server.
amptk v0.8.8
- unify the output naming files from UNOISE2 and DADA2 "clustering" output.
amptk v0.8.7
- support for new DADA2 algorithm allowing variable length reads, must have > v1.3.3.
amptk v0.8.6
- add
amptk drop
to remove OTUs from a dataset and then create an updated OTU table - fix for
amptk illumina
where empty files would cause script to terminate - fix for biom output to explicitly be json
- fix in
amptk remove
to allow fasta output
amptk v0.8.5
- bug fixes for pre-processing steps where short primer-dimers could make it through filtering, get padded with N's and get incorporated as OTUs in clustering
- update to
amptk filter
to output the final OTU table to have real read counts as opposed to "pseudo" counts from normalization. Filtering is done with normalization, but now read counts are restored to original read numbers. Important for downstream stats like beta diversity - improved read summary reporting in pre-processing steps
- update to
amptk unoise2
to output both inferred or denoised sequences/tables as well as biological OTU sequences (clustered at 97%).
amptk v0.8.0
- package has undergone a name change to reflect changes in the scripts. Originally the project started as essentially a wrapper for UPARSE and thus relied heavily on USEARCH. Coupled with originally supporting fungal ITS sequences, it was named UFITS (usearch fungal ITS). However, the current implementation of AMPtk relies very little on USEARCH and can support any amplicon based NGS dataset. Out of the box the following DB are packaged: fungal ITS, fungal LSU, 16S, insect/animal COI. Thus I feel that
amptk
is a better name that describes what the scripts do. - option
-p, --pad
was added foramptk ion
,amptk illumina
,amptk illumina2
, andamptk 454
to allow user to turn off the padding with Ns to the--trim_len
- option
-c,--calculate
was added toamptk filter
to control how the script calculates index-bleed. By default it calculates index-bleed into the mock community sample (-b
) as well as out of the mock community into the rest of the samples. However, if members of the mock community are found in your samples, this calculated number is wrong, so if any members of your mock community are plausibly found in samples that you are sequencing, then you should use the--calculate in
option. - packaged databases had to be moved to a different sharing location (USDA now prevents use of dropbox), so they are now on Box, however it seems like the download speed is quite a bit slower. If anybody has recommendations for a free place to host these databases let me, need about 1 GB of space and need to be able to access with a directly link from the command line.
ufits v0.7.4
- move the mergereads function to general library
- better reporting for merge illumina reads for both
ufits illumina
ufits illumina2
- fix for
ufits illumina
to only require primer if amplicons are longer than the read length. This is to prevent amplicons that are shorter than the read length to be discarded as they are automatically trimmed/merged viausearch -fastq_mergepairs
tool (and I can't change this). So the default behavior now is to require a forward primer via--require_primer on
setting only if the amplicon length is longer than the read length. Read length is calculated automatically via sampling the first 50 reads, the automatic detection is overruled by the--read_length
option
ufits v0.7.3
- fix critical bug in
ufits illumina
processing of reads where if reverse primer was not found read would be discarded
ufits v0.7.2
- update to
ufits taxonomy
to allow for taxonomy to be calculated elsewhere, pass the-t, --taxonomy
option and a 2 column tsv file, OTUTaxonomy - update to progress/multiprocessing steps
- re-write demultiplexing steps for faster processing
- support gzip files in
ufits illumina2
- options in
ufits filter
for how the threshold is calculated for index-bleed filtering