-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable background subtraction / file unzipping #118
base: main
Are you sure you want to change the base?
Commits on Oct 13, 2023
-
fix: glob all files without exclusion of bg. fix bg getting
changelog: - All files are streamed to input files, rather than just files without associated background files. - Background filenames (stripped of extensions) are no longer part of the input stream for `rules.score`, preventing odd errors - Updated associated files to pull all input files as desired.
Configuration menu - View commit details
-
Copy full SHA for 58591e6 - Browse repository at this point
Copy the full SHA 58591e6View commit details
Commits on Nov 3, 2023
-
[!broken!] fix: enable unzip, redo file glob, add background
changelog: - NOTE THAT WORKFLOW IS CURRENTLY BROKEN DUE TO SNAKEMAKE I/O REASONS AND I AM COMMITTING INTERRIM CHANGES - fix: redo file glob -- file globbing now proceeds through `glob_wildcards` to more cleanly grab input files - fix: enable unzip -- unzipping has been overhauled (these are forward changes adapted from snekmer 2.0.0 / the biotite-kmers branch). - fix: add background -- changes have been made to collate background files and use their kmer distribution to subtract a background from protein family kmer models. These fixes work piece-by-piece locally but have not been fully tested and may not work ideally yet.
Configuration menu - View commit details
-
Copy full SHA for 8c0f312 - Browse repository at this point
Copy the full SHA 8c0f312View commit details
Commits on Nov 7, 2023
-
[!broken!] build: tweak workflow to attempt snakemake debug
- Note: changes did NOT work, hence the "broken" tag.
Configuration menu - View commit details
-
Copy full SHA for be53b22 - Browse repository at this point
Copy the full SHA be53b22View commit details
Commits on Nov 8, 2023
-
[!broken!] fix: pipe background i/o, update filenames
changelog: - snakemake now correctly builds DAG for background workflow, including file unzipping - some files have been renamed for simplicity - some instances of `skm.io.load_npz` have been replaced with `np.load` due to KeyError (perhaps due to numpy or pickle version?) - `rules.combine_background` now uses kmer basis set for each family to reshape each background vector. should make files smaller and workflow more compact - NOTE: WORKFLOW IS BROKEN AT `rules.score_with_background` due to file load / array shape issues that will be fixed in the next commit. - addresses #37
Configuration menu - View commit details
-
Copy full SHA for 7809d10 - Browse repository at this point
Copy the full SHA 7809d10View commit details
Commits on Nov 28, 2023
-
feat: update kmer probability scoring for background subtract
changelog: - kmer probability scoring using background subtraction is now the default scoring method - `snekmer.score.feature_class_probabilities` now performs either background subtraction based scoring, family label based scoring, or a combination thereof depending on user input - TODO: integration with `snekmer.score.KmerScorer` object
Configuration menu - View commit details
-
Copy full SHA for 3e7b10d - Browse repository at this point
Copy the full SHA 3e7b10dView commit details
Commits on Dec 12, 2023
-
chore: update config, tick version, and do file cleanup
changelog: - new config parameter config['score']['method'] added for compatibility with additional new(!) scoring methods - uptick version from v1.1.1 -> v1.4.0 - upticked +3 minor versions in anticipation of two pending PRs - remove no longer needed files
Configuration menu - View commit details
-
Copy full SHA for f73f17e - Browse repository at this point
Copy the full SHA f73f17eView commit details -
feat: enable kmer scoring via background subtraction (fixes #37)
changelog: - kmers can now be scored by probability score subtracting the observed kmers in a supplied background set, family set, or combining both background and family - note: some column headers have changed, which may affect downstream analysis (e.g. integration with #115, #116) - to handle user-supplied background files, new rules have been created to count background kmers and combine background kmer counts into a background matrix. The appropriate files for the new workflow have been created. - extensive changes have been made to `snekmer.score` to accommodate the new changes, including: - `snekmer.score.score` now has 3 distinct formulae to compute probability scores according to the desired scoring method - `snekmer.score.feature_class_probabilities` now also integrates the scoring method - the main scoring rule itself has been significantly altered as follows" - all references to the old and not-working "background subtraction" (e.g. separating sequences by "sample" or "background" labels) have been removed - extraneous kmer probability scores for every family are no longer calculated; only the family in question's kmer profile is scored - scoring method now integrated
Configuration menu - View commit details
-
Copy full SHA for f394fee - Browse repository at this point
Copy the full SHA f394feeView commit details
Commits on Dec 21, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 4ee5d4a - Browse repository at this point
Copy the full SHA 4ee5d4aView commit details
Commits on Dec 22, 2023
-
fix: rework default filename parsing
changelog: - fix: `snekmer.utils.get_family` now accepts `regex=None` by default as to not erroneously truncate filenames. - fix: small change to `snekmer.utils.get_family` to correctly identify directories. - refactor: overhaul `snekmer.utils.split_file_ext` to split at the point of an .faa, .fa, .fna, or .fasta extension instead of assuming at most 2 potential extensions
Configuration menu - View commit details
-
Copy full SHA for 06e21d9 - Browse repository at this point
Copy the full SHA 06e21d9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1344d93 - Browse repository at this point
Copy the full SHA 1344d93View commit details
Commits on Jan 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 35e0b49 - Browse repository at this point
Copy the full SHA 35e0b49View commit details -
refactor: deprecate
process.smk
for file unzippingchangelog: - file unzipping is now handled by top-level unzip code in each snakefile; thus, `process.smk` is outdated and has been deleted as it is no longer needed.
Configuration menu - View commit details
-
Copy full SHA for 2625da7 - Browse repository at this point
Copy the full SHA 2625da7View commit details -
refactor: apply file wildcard globbing changes to cluster,search
changelog: - file wildcard globbing previously proceeded through `glob.glob`, but had been updated in the model workflow to use snakemake's `glob_wildcards` utility. This method has the added benefit of preventing recursion errors with wildcard retrieval from gzipped files. The changes have now been applied to cluster and search workflows.
Configuration menu - View commit details
-
Copy full SHA for 3309880 - Browse repository at this point
Copy the full SHA 3309880View commit details -
Configuration menu - View commit details
-
Copy full SHA for 85a009f - Browse repository at this point
Copy the full SHA 85a009fView commit details
Commits on Jan 8, 2024
-
fix,refactor: repair cluster mode unzip and file globbing
changelog: - refactor: move `cluster_cluster.py` -> `cluster.py` - refactor: move cluster report generation to separate script directive - fix: change cluster mode file globbing to mirror model mode changes, i.e. uses snakemake `glob_wildcards` instead of python `glob.glob`. This should also fix unzipping issues and recursion errors related to unzipping.
Configuration menu - View commit details
-
Copy full SHA for 4c9a71d - Browse repository at this point
Copy the full SHA 4c9a71dView commit details
Commits on Jan 9, 2024
-
fix,style: update search file glob. apply snakefmt
changelog: - fix: search file globbing updated to use snakemake's `glob_wildcards` rather than python's `glob.glob` in search mode. Should also resolve issues with file detection for files requiring unzipping and avoid recursion errors. Tested locally with a small subset of small families. - style: applied snakefmt to `cluster.smk` and `search.smk`
Configuration menu - View commit details
-
Copy full SHA for 60fb8d2 - Browse repository at this point
Copy the full SHA 60fb8d2View commit details
Commits on Jan 30, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 6a8c042 - Browse repository at this point
Copy the full SHA 6a8c042View commit details
Commits on Jan 31, 2024
-
Configuration menu - View commit details
-
Copy full SHA for abaac01 - Browse repository at this point
Copy the full SHA abaac01View commit details -
Configuration menu - View commit details
-
Copy full SHA for 15421ad - Browse repository at this point
Copy the full SHA 15421adView commit details
Commits on Feb 1, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5ed97ad - Browse repository at this point
Copy the full SHA 5ed97adView commit details -
Configuration menu - View commit details
-
Copy full SHA for 270f1d9 - Browse repository at this point
Copy the full SHA 270f1d9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 43a8ccf - Browse repository at this point
Copy the full SHA 43a8ccfView commit details
Commits on Feb 7, 2024
-
feat,refactor: add
--resources
flag to CLI and streamlinechangelog: - feat: Snakemake `--resources` flag has been added to Snekmer CLI for all modes and tested locally. - refactor: Wrapped all snakemake command line arguments into dictionary which is now passed to all snekmer subcommands. Removes the redundancy in specifying the same command line arguments every time a subcommand is called.
Configuration menu - View commit details
-
Copy full SHA for 2d72548 - Browse repository at this point
Copy the full SHA 2d72548View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5e39a7f - Browse repository at this point
Copy the full SHA 5e39a7fView commit details
Commits on Feb 20, 2024
-
fix,refactor: resolve array shapes. streamline code
changelog: - fix: resolve error with array shapes due to matrix dimensions (transpose matrix required) - refactor: renamed variables to streamline code
Configuration menu - View commit details
-
Copy full SHA for 20023fd - Browse repository at this point
Copy the full SHA 20023fdView commit details
Commits on Mar 13, 2024
-
Configuration menu - View commit details
-
Copy full SHA for ba6df73 - Browse repository at this point
Copy the full SHA ba6df73View commit details
Commits on Apr 16, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 710d2ba - Browse repository at this point
Copy the full SHA 710d2baView commit details -
Configuration menu - View commit details
-
Copy full SHA for 679af3d - Browse repository at this point
Copy the full SHA 679af3dView commit details
Commits on May 7, 2024
-
fix: resolve array shape mismatches
changelog: - basis harmonization now accounts for either 1D or 2D array cases - 1D arrays are explicitly handled to match expected shape parameters set by the assumption that input arrays are 2D - `utils.check_n_seqs` now uses boolean input arg to handle gz files rather than inferring from filename
Configuration menu - View commit details
-
Copy full SHA for 31d3707 - Browse repository at this point
Copy the full SHA 31d3707View commit details -
Configuration menu - View commit details
-
Copy full SHA for c40e09f - Browse repository at this point
Copy the full SHA c40e09fView commit details
Commits on May 8, 2024
-
fix: verify bg file presence for all modes. bypass unicode error
changelog: - Workflow now accounts for cases where no background files are included when either "combined" or "background" mode are selected. (TODO: raise warning in this case) - Bypass UnicodeDecodeError for `utils.check_n_seqs`
Configuration menu - View commit details
-
Copy full SHA for 916806e - Browse repository at this point
Copy the full SHA 916806eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5d3ffac - Browse repository at this point
Copy the full SHA 5d3ffacView commit details -
Configuration menu - View commit details
-
Copy full SHA for 22dd1c5 - Browse repository at this point
Copy the full SHA 22dd1c5View commit details
Commits on May 10, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 163239c - Browse repository at this point
Copy the full SHA 163239cView commit details
Commits on May 28, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0de98d7 - Browse repository at this point
Copy the full SHA 0de98d7View commit details