Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major data model refactor #177

Merged
merged 471 commits into from
Nov 12, 2024
Merged

Major data model refactor #177

merged 471 commits into from
Nov 12, 2024

Conversation

pavlovicmilena
Copy link
Member

Major updates:

  • Data model refactored and simplified to use bionumpy and be internally AIRR-compatible as well
  • Improved usability: reports and docs
  • Two new workflows [to be developed further]: clustering and generative models

pavlovicmilena and others added 29 commits November 4, 2024 11:31
- removed obsolete build_Dataset_yaml to avoid confusion
- datasets are always called datasetname.yaml (avoid 'dataset_dataset.yaml' since it's confusing)
- in build_dataset_overview_yaml: remove option to have datasets with multiple names, it is always called 'dataset'
- central dataset discovery in Galaxy Util
- renamed ambiguous name build_yaml_from_arguments to build_train_ml_model_yaml.py
- training split size for clustering instruction
- Made some tests faster by using smaller dummy datasets
- discover dataset type from dataset.yaml file if provided
- add explicit parameter 'label_columns' to import params -> imports only those labels if specified (avoids clutter on galaxy)
- update importer docs
…unt = 2, does the following:

- if count >2, the top chain pair is selected based on duplicate_count. if duplicate_count is not available, they are selected randomly
- if count <2, receptor is removed. this removal now also includes receptors with 2 identical chains (e.g.,TRB+TRB)
@pavlovicmilena pavlovicmilena merged commit 7bbf443 into master Nov 12, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants