Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kmer count + Dimensionality reduction #40

Merged
merged 97 commits into from
Apr 4, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
25d5d64
Add kmer count modules
weaglesBio Oct 16, 2023
f3bedf9
Add to main workflow
weaglesBio Oct 17, 2023
a311d1e
Add kmer count modules
weaglesBio Oct 16, 2023
af871fe
Add to main workflow
weaglesBio Oct 17, 2023
1b51964
Add parameters
weaglesBio Oct 25, 2023
e29612f
Merge changes
weaglesBio Oct 25, 2023
3e67c80
Handle one line csv
weaglesBio Oct 30, 2023
aacc15c
Add kmer count modules
weaglesBio Oct 16, 2023
ca54275
Combine embeddings csv
weaglesBio Nov 6, 2023
1e9852b
Fix combine
weaglesBio Nov 17, 2023
927b1a3
Adding organellar blast subworkflow
DLBPointon Oct 9, 2023
0e7850a
Updates
DLBPointon Oct 13, 2023
5c6b6db
Completing organelle blast, modified python script to accept arrayLis…
DLBPointon Oct 13, 2023
fb15428
Black linting
DLBPointon Oct 13, 2023
e5526fd
Uncommenting BLAST for testing
DLBPointon Oct 13, 2023
bac7913
Updates
DLBPointon Oct 18, 2023
b36c554
Updates for Organella Blast output checking
DLBPointon Oct 19, 2023
c7da079
Local update for diamond blast
DLBPointon Oct 19, 2023
b2e23a2
Updated to filter out empties
DLBPointon Oct 19, 2023
5bc2f95
Fixes for blast subworkflows
DLBPointon Oct 26, 2023
a761e62
Added a blast module that does not rely on makeblastdb
DLBPointon Oct 26, 2023
65614f3
Fixes to better allow running on github
DLBPointon Oct 26, 2023
c60721d
Updating modules
DLBPointon Nov 1, 2023
f9e65d2
Updating modules and patches
DLBPointon Nov 1, 2023
154b185
Updating pipeline to reflect changes
DLBPointon Nov 1, 2023
009f1e3
Generalising the Blast module for FULL databases as well as local mak…
DLBPointon Nov 2, 2023
8c57157
Prettier linting
DLBPointon Nov 2, 2023
193f480
Adding tracedir to schema
DLBPointon Nov 2, 2023
05e25b5
Update script based on recomendation
DLBPointon Nov 6, 2023
79305c5
Black Linting
DLBPointon Nov 6, 2023
99a7aa5
testing
DLBPointon Nov 9, 2023
1654931
Updating
DLBPointon Nov 9, 2023
8925cac
add coverage
yumisims Oct 30, 2023
e49331f
add samtools merge
yumisims Oct 30, 2023
b2ed42d
add merged
yumisims Oct 30, 2023
bbb05d9
put in condition for different read type
yumisims Nov 2, 2023
a784236
re-written samtools_depth_average_coverage.py
yumisims Nov 2, 2023
4e71157
re-written samtools_depth_average_coverage.py
yumisims Nov 2, 2023
fb7bf44
re-written samtools_depth_average_coverage.py
yumisims Nov 2, 2023
6f0c412
amended gc_content.py to comprehension form
yumisims Nov 2, 2023
5e2ad8e
added change to samtools_depth_average_coverage.nf
yumisims Nov 3, 2023
d39d99d
black
yumisims Nov 3, 2023
033043f
change main workflow
yumisims Nov 3, 2023
f1025fb
remove space
yumisims Nov 3, 2023
d62859e
changed github test yaml
yumisims Nov 3, 2023
5343545
add barcode to ci
yumisims Nov 4, 2023
b95e351
change in se mapping
yumisims Nov 4, 2023
42c49a0
change config
yumisims Nov 4, 2023
5208ab1
change config
yumisims Nov 4, 2023
c0a5a03
changed bedtool
yumisims Nov 6, 2023
6baec5c
changed grabfile wildcard
yumisims Nov 6, 2023
3b8d867
changed grabfile wildcard
yumisims Nov 6, 2023
84b50c3
changed grabfile wildcard
yumisims Nov 6, 2023
e754440
done
yumisims Nov 6, 2023
a386866
done
yumisims Nov 6, 2023
4600cdc
added ncbi id
yumisims Nov 6, 2023
af67f69
change software version'
yumisims Nov 6, 2023
029f1f6
refine bedtools and other scripts
yumisims Nov 7, 2023
2808cba
Replacing grep with awk, grep caused errors with empty products
DLBPointon Nov 10, 2023
036c014
Updating organellar blast based on discussion with Eerik and @yumisims
DLBPointon Nov 10, 2023
f05f05b
Updated container from @yumisims and tested
DLBPointon Nov 14, 2023
3b50c01
Black formatting
DLBPointon Nov 14, 2023
b61cadf
Black Formatting
DLBPointon Nov 14, 2023
4401fb6
Updates based on comments from @ea10
DLBPointon Nov 16, 2023
efdc828
Correction and removal of view statement
DLBPointon Nov 16, 2023
65e87c8
Black linting for python script
DLBPointon Nov 16, 2023
1a8b635
Add parameters
weaglesBio Oct 25, 2023
2958627
Add kmer count modules
weaglesBio Oct 16, 2023
0bee610
Add to main workflow
weaglesBio Oct 17, 2023
e40e4ce
Handle one line csv
weaglesBio Oct 30, 2023
40ce6a4
Merge branch 'dev' into kmer_count
weaglesBio Nov 17, 2023
34385bb
dp24 suggested changes
weaglesBio Nov 27, 2023
5cfe9ad
Updates to fix bugs, add params and get kmer analysis running
DLBPointon Feb 23, 2024
0fdc0f3
Merge branch 'dev' into kmer_count
DLBPointon Feb 23, 2024
891964e
linting fixes
DLBPointon Feb 27, 2024
336a6c8
fixes
DLBPointon Feb 27, 2024
aeeabb0
linting
DLBPointon Feb 27, 2024
63f3c34
linting
DLBPointon Feb 27, 2024
e37baa4
linting
DLBPointon Feb 27, 2024
5f5ccf0
barcode was wrong in test
DLBPointon Feb 27, 2024
ebc1d5a
Fixed container import, added conda recipe and corrected version output
DLBPointon Feb 29, 2024
87e8b8d
Updated container, added custom version information for the umap moddule
DLBPointon Feb 29, 2024
25bb22b
/tmp/ was being used, changed to custom cache dirs to script
DLBPointon Feb 29, 2024
e75d030
updating the main script, minor stuff
DLBPointon Feb 29, 2024
4978ec2
Changes for testing
DLBPointon Mar 1, 2024
affd380
Updating the vecscreen value, apparently changed from correct value i…
DLBPointon Mar 1, 2024
e6550bf
Updating data
DLBPointon Apr 4, 2024
4e8d53d
Updating test files, test data, formatting and logic change
DLBPointon Apr 4, 2024
f36ed3f
Add pre-fetch for the containers in ascc with nf-download
DLBPointon Apr 4, 2024
77e08e6
Fix CI and Black formatting
DLBPointon Apr 4, 2024
858fefb
Black Formatting
DLBPointon Apr 4, 2024
74a74f6
CI fix
DLBPointon Apr 4, 2024
b9745c7
CI fix add aptainer
DLBPointon Apr 4, 2024
d31813c
kill lint for file exist
DLBPointon Apr 4, 2024
a9fe2c8
Lint fix
DLBPointon Apr 4, 2024
f2c8e71
Update resources
DLBPointon Apr 4, 2024
b43f082
Merge branch 'kmer_count' of https://github.com/sanger-tol/ascc into …
DLBPointon Apr 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add to main workflow
  • Loading branch information
weaglesBio committed Nov 17, 2023
commit 0bee610689c72d19ab5972826b7bf5a9d542bbe5
7 changes: 7 additions & 0 deletions modules/local/kmer_count_dim_reduction.nf
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,18 @@ process KMER_COUNT_DIM_REDUCTION {
cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(python --version | sed 's/Python //g')
<<<<<<< HEAD
pandas: \$(python3 -c 'import pandas; print(pandas.__version__)')
tensorflow: \$(python3 -c 'import tensorflow; print(tensorflow.__version__)')
scikit-learn: \$(python3 -c 'import sklearn; sklearn.show_versions()')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scikit-learn: \$(python3 -c 'import sklearn; sklearn.show_versions()') part caused a crash when I tried running the subworkflow on my local computer.
Instead of outputting just a version number, it produced the long text below, which then caused a Nextflow crash with the message "yaml.scanner.ScannerError: mapping values are not allowed here in "collated_versions.yml", line 22, column 11"

System:
    python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0]
executable: /usr/local/bin/python3
   machine: Linux-5.4.0-166-generic-x86_64-with-glibc2.36

Python dependencies:
      sklearn: 1.1.2
          pip: 23.3.1
   setuptools: 68.2.2
        numpy: 1.24.4
        scipy: 1.11.3
       Cython: None
       pandas: 2.1.1
   matplotlib: 3.6.0
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /usr/local/lib/libopenblasp-r0.3.24.so
        version: 0.3.24
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libgomp
       filepath: /usr/local/lib/libgomp.so.1.0.0
        version: None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before crashing at the software versions part, it counted the kmers and made CSV files for dimensionality reduction embeddings like it should, so at least that's good

umap-learn: \$(python3 -c 'import umap; print(umap.__version__)')
matplotlib: \$(python3 -c 'import matplotlib; print(matplotlib.__version__)')
=======
tensorflow: \$(tensorflow --version | sed 's/tensorflow //g')
scikit-learn: \$(scikit-learn --version | sed 's/scikit-learn //g')
umap-learn: \$(umap-learn --version | sed 's/umap-learn //g')
matplotlib: \$(matplotlib --version | sed 's/matplotlib //g')
>>>>>>> f3bedf9 (Add to main workflow)
kmer_count_dim_reduction.py: \$(kmer_count_dim_reduction.py --version | cut -d' ' -f2)
END_VERSIONS
"""
Expand Down
2 changes: 1 addition & 1 deletion workflows/ascc.nf
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ workflow ASCC {

emit:


software_ch = CUSTOM_DUMPSOFTWAREVERSIONS.out.yml
versions_ch = CUSTOM_DUMPSOFTWAREVERSIONS.out.versions
}
Expand Down