Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major data model refactor #177

Merged
merged 471 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
471 commits
Select commit Hold shift + click to select a range
c05f22a
Merge remote-tracking branch 'origin/master' into nar
pavlovicmilena Oct 29, 2023
feed634
add device to vae
pavlovicmilena Oct 29, 2023
0afcf7a
add vae summary report
pavlovicmilena Oct 30, 2023
0591198
add pypi publish dev setup
pavlovicmilena Oct 30, 2023
3e49981
update dev version
pavlovicmilena Oct 30, 2023
1aafbf7
remove condition from pypi setup
pavlovicmilena Oct 30, 2023
2bfb2db
update readme to reflect that this is dev version
pavlovicmilena Oct 30, 2023
064b851
add conda package workflow
pavlovicmilena Oct 31, 2023
99f58bd
update PyPI action
pavlovicmilena Oct 31, 2023
f604f25
revert to trusted publisher for PyPI
pavlovicmilena Oct 31, 2023
56300bf
add install conda-build
pavlovicmilena Oct 31, 2023
a261485
fix github action
pavlovicmilena Oct 31, 2023
06eea30
add publish conda as separate action
pavlovicmilena Oct 31, 2023
3badbe2
fix paths in github action
pavlovicmilena Oct 31, 2023
679b483
update conda in github action
pavlovicmilena Oct 31, 2023
d1e87c1
update conda in github action
pavlovicmilena Oct 31, 2023
080850f
tmp remove unused action
pavlovicmilena Oct 31, 2023
8b7b8fe
remove uses key in github action
pavlovicmilena Oct 31, 2023
7205d47
update publish-dev-to-conda.yml
pavlovicmilena Oct 31, 2023
a90fb96
update publish-dev-to-conda.yml
pavlovicmilena Oct 31, 2023
08548a0
update publish-dev-to-conda.yml
pavlovicmilena Oct 31, 2023
64ad364
update publish-dev-to-conda.yml
pavlovicmilena Oct 31, 2023
58fb89e
update publish-dev-to-conda.yml
pavlovicmilena Oct 31, 2023
2be7254
update publish-dev-to-conda.yml
pavlovicmilena Oct 31, 2023
28d5d7f
update publish-dev-to-conda.yml
pavlovicmilena Oct 31, 2023
3e2f234
Updated AminoAcidFrequencyDistribution report: plot in addition 'freq…
LonnekeScheffer Nov 1, 2023
6610263
keep only pypi-publish action
pavlovicmilena Nov 2, 2023
ebd3ee3
integrate ligo with immuneML
pavlovicmilena Nov 2, 2023
0a24ad9
update README.md
pavlovicmilena Nov 2, 2023
a8694d7
add IMGT position fix
pavlovicmilena Nov 2, 2023
b088c06
add dataset summary after import
pavlovicmilena Nov 2, 2023
976d142
run data reports on both simulated and original data in TrainGenModel
pavlovicmilena Nov 2, 2023
a07a491
update docs in TrainGenModelInstruction
pavlovicmilena Nov 3, 2023
0a26117
Dimensionality reduction using umap as part of exploratory analysis
ivargr Nov 3, 2023
d70eaae
Some minor fixes to dimensionality reduction
ivargr Nov 3, 2023
2f4d1f6
Merge branch 'nar' of github.com:uio-bmi/immuneML into nar
ivargr Nov 3, 2023
4a21318
Added umap-learn dependency
ivargr Nov 3, 2023
beb9468
fix integration with ligo and bionumpy
pavlovicmilena Nov 4, 2023
95726ab
Merge remote-tracking branch 'origin/nar' into nar
pavlovicmilena Nov 4, 2023
935f4cc
remove unused code
pavlovicmilena Nov 4, 2023
9f789d9
Add pipeline for applying generative models (#162)
mmamica Nov 4, 2023
860e9b9
fix stochasticity in app test
pavlovicmilena Nov 4, 2023
67fb3ac
update docs
pavlovicmilena Nov 5, 2023
8a23948
fix TCRdist test
pavlovicmilena Nov 5, 2023
229b64e
add clustering instruction draft
pavlovicmilena Nov 6, 2023
9bf3b52
update ApplyGenModel for loading the method
pavlovicmilena Nov 6, 2023
0101f2d
Added VJ gene distribution report
LonnekeScheffer Nov 6, 2023
4a51a66
small bugfix
LonnekeScheffer Nov 6, 2023
c0a514c
fix IO for gen models
pavlovicmilena Nov 6, 2023
4957b93
add more dim red methods
pavlovicmilena Nov 6, 2023
de05f14
small update SimpleDatasetOverview
LonnekeScheffer Nov 7, 2023
2ac307c
initial implementation of clustering instruction
pavlovicmilena Nov 7, 2023
6c2986c
refactoring
LonnekeScheffer Nov 8, 2023
9d0193e
new build_dataset_overview script for updated create dataset tool
LonnekeScheffer Nov 8, 2023
95add0c
fix with new bionumpy, update docs, clustering updates
pavlovicmilena Nov 8, 2023
b90888c
Added DatasetGenerationOverviewTool for Galaxy
LonnekeScheffer Nov 8, 2023
9189040
fix docs
pavlovicmilena Nov 8, 2023
8a2c1c9
removed unused line, fix tests
LonnekeScheffer Nov 8, 2023
f078371
updated HTML output of DatasetExport and ExploratoryAnalysis
LonnekeScheffer Nov 8, 2023
6e90ce1
updates and bugfixes build_dataset_overview_yaml.py
LonnekeScheffer Nov 8, 2023
ab27ec5
small update: only include 'empty' report when no other reports are s…
LonnekeScheffer Nov 8, 2023
6a821aa
fix docs
pavlovicmilena Nov 9, 2023
f6ac87d
update exp analysis to work with any dim red
pavlovicmilena Nov 9, 2023
071c7e1
update docs
pavlovicmilena Nov 9, 2023
51afe04
removed output typing "dict | None" as it is not compatible with Pyth…
LonnekeScheffer Nov 9, 2023
50a9ba1
update in CompAIRR test to make test pass
LonnekeScheffer Nov 9, 2023
6df953b
Merge branch 'nar_summaryreports' into nar
LonnekeScheffer Nov 9, 2023
0fb0ed3
bugfix: store training and test data at different paths so it does no…
LonnekeScheffer Nov 9, 2023
d9e7bb2
bugfix: store training and test data at different paths so it does no…
LonnekeScheffer Nov 9, 2023
bfaf4e0
preventative assertion statement i RepertoireBuilder, do not allow re…
LonnekeScheffer Nov 9, 2023
8ce32c8
minor fixes in tests and report titles
LonnekeScheffer Nov 9, 2023
fe17294
set version
pavlovicmilena Nov 10, 2023
62fed22
Merge remote-tracking branch 'origin/development' into nar
pavlovicmilena Nov 10, 2023
1a7daca
update github actions to run only on .py or .yaml file change
pavlovicmilena Nov 10, 2023
9ea30e8
update python version
pavlovicmilena Nov 10, 2023
3b1100b
fix tests
pavlovicmilena Nov 13, 2023
bd2d06d
fix tests
pavlovicmilena Nov 13, 2023
d6b2bd8
Merge remote-tracking branch 'origin/master' into nar
pavlovicmilena Nov 17, 2023
de544b2
add TrainGenModel reports and export combined dataset
pavlovicmilena Nov 17, 2023
fd032bd
Change storing of binary files for Sonnia model into storing weights …
mmamica Nov 23, 2023
98f0f17
update clustering html output
pavlovicmilena Nov 23, 2023
e4ed9b5
add pwm summary report
pavlovicmilena Nov 23, 2023
37ffd65
add ApplyGenModelTool for Galaxy
pavlovicmilena Nov 23, 2023
edcb7ab
fix paths in clustering html
pavlovicmilena Nov 23, 2023
c725baf
add train/test split to TrainGenModelInstruction
pavlovicmilena Nov 24, 2023
36a3ded
KL report (#166)
knutdrand Nov 27, 2023
7371fe9
add default params to KLGenModelReport
pavlovicmilena Nov 27, 2023
dc13844
fix typo in build_apply_gen_model_specs
pavlovicmilena Nov 28, 2023
6e7573c
Updated docs for the feature reports. Improved test for featuredistri…
LonnekeScheffer Nov 30, 2023
37b6509
improved docs; added figure examples of some reports
LonnekeScheffer Nov 30, 2023
0f7ea6f
Added support for ReceptorDataset in SequenceLengthDistribution
LonnekeScheffer Nov 30, 2023
00d7537
update galaxy sim tool to work with ligo
pavlovicmilena Dec 2, 2023
fd2ab02
- updated overview figure
LonnekeScheffer Dec 5, 2023
2b308ed
make airrexporter not depend on OLGA
LonnekeScheffer Dec 5, 2023
f0392ea
added yamlbuilder for "train gen models" tool, + other galaxy tool up…
LonnekeScheffer Dec 6, 2023
61d0a09
import sonia/sonnia internally so galaxy doesnt fail
LonnekeScheffer Dec 6, 2023
ed17a6d
added default params for galaxy generative models (except chain)
LonnekeScheffer Dec 6, 2023
1bd1583
removed option (gen model overviews are always run)
LonnekeScheffer Dec 6, 2023
31f000e
Merge branch 'master' into nar
LonnekeScheffer May 28, 2024
03abd00
Merge branch 'bugfix_element_dataset_paths' into nar
LonnekeScheffer May 28, 2024
3d9c20e
safer tests (try/finally around blocks where the current working dir …
LonnekeScheffer May 28, 2024
ed98358
bugfix TrainGenModelInstruction: use batchfiles_path param in sequenc…
LonnekeScheffer May 28, 2024
0171f66
added 'export_generated_dataset' param to gen model instruction. Upda…
LonnekeScheffer May 30, 2024
703bb3b
added link to log file in HTML output (useful for galaxy users), stan…
LonnekeScheffer May 30, 2024
ce00e98
galaxy bugfix: output dataset file should always be called dataset.yaml
LonnekeScheffer May 30, 2024
b9c5a5f
added default params for vaesummary
LonnekeScheffer Jun 12, 2024
37b5269
put type of generative method in the instruction name for clarity
LonnekeScheffer Jun 12, 2024
1669c9b
fix exporting label info in ProbabilisticBinaryClassifier
pavlovicmilena Jun 21, 2024
900b02d
fix docs
pavlovicmilena Jun 21, 2024
f0ff329
Merge remote-tracking branch 'origin/nar' into nar
pavlovicmilena Jun 21, 2024
9e4a48a
add 'gen_model_name' to exported datasets
pavlovicmilena Jun 21, 2024
6b49d43
format updates before pushing SequenceCountDistribution report to master
LonnekeScheffer Jul 2, 2024
b6b2bfd
Merge branch 'master' into nar
LonnekeScheffer Jul 2, 2024
915efd6
Merge remote-tracking branch 'origin/master' into nar
pavlovicmilena Aug 29, 2024
d3dcc02
rename chain to locus
pavlovicmilena Sep 10, 2024
63f9d86
refactor data model halfway through
pavlovicmilena Sep 15, 2024
d8ddb30
refactor data model
pavlovicmilena Sep 19, 2024
e1d1ffa
update tests with new data model
pavlovicmilena Sep 24, 2024
bb70f90
fix feature reports and error messages
pavlovicmilena Sep 25, 2024
0ec1e1a
Merge remote-tracking branch 'origin/nar' into nar_io_refactor
pavlovicmilena Sep 25, 2024
5958b9e
fix data model tests
pavlovicmilena Sep 25, 2024
9a77bc6
imports update
pavlovicmilena Oct 3, 2024
04605ff
Merge branch 'master' into nar
LonnekeScheffer Oct 8, 2024
7810fe2
report more metrics when doing default galaxy simplified interface cl…
LonnekeScheffer Oct 8, 2024
aeb7267
added tool for building YAML for random dummy dataset simulation
LonnekeScheffer Oct 9, 2024
e8f13ec
refactor all imports to convert to airr format
pavlovicmilena Oct 11, 2024
62fc0e5
start fixing other tests
pavlovicmilena Oct 11, 2024
74b8b59
tool for building ML application yaml for galaxy
LonnekeScheffer Oct 12, 2024
da1b9a0
minor fix ApplyGenModelTool.py
LonnekeScheffer Oct 12, 2024
0607729
make definition parser throw warning instead of error when symboltabl…
LonnekeScheffer Oct 12, 2024
b853815
fix build ML application test
LonnekeScheffer Oct 12, 2024
12a095c
bugfix export data from gen model instruction in immuneML format in G…
LonnekeScheffer Oct 12, 2024
c0ddd5b
test better error handling in galaxy
LonnekeScheffer Oct 12, 2024
384987f
add HTML page for failed galaxy runs
LonnekeScheffer Oct 12, 2024
35fc729
Galaxy: always make immuneML zip as output with 'finally' statement
LonnekeScheffer Oct 12, 2024
4f75671
galaxy debug print statements
LonnekeScheffer Oct 12, 2024
84c069f
galaxy test print statements
LonnekeScheffer Oct 12, 2024
10c7a69
galaxy test prints
LonnekeScheffer Oct 12, 2024
08660ad
galaxy test print statements
LonnekeScheffer Oct 12, 2024
536d439
galaxy test print statements
LonnekeScheffer Oct 12, 2024
9c75d47
fix galaxy file location
LonnekeScheffer Oct 12, 2024
f2ffa79
test without raising exception
LonnekeScheffer Oct 12, 2024
1104c3b
fix failed galaxy html
LonnekeScheffer Oct 12, 2024
035bade
fix galaxy html builder paths
LonnekeScheffer Oct 12, 2024
ac6c73f
test attempt to show traceback in html
LonnekeScheffer Oct 12, 2024
55ea247
galaxy update
LonnekeScheffer Oct 12, 2024
4d4094a
fix galaxytool
LonnekeScheffer Oct 12, 2024
a026d8d
show exception in failed galaxy
LonnekeScheffer Oct 12, 2024
fb767af
updated GalaxyTOol
LonnekeScheffer Oct 12, 2024
1aef90b
attempt to fix path discovery FailedGalaxyHTMLBuilder
LonnekeScheffer Oct 12, 2024
1f5160f
test print statements for galaxy
LonnekeScheffer Oct 12, 2024
1e8acf0
fix FailedGalaxyHTMLBuilder paths
LonnekeScheffer Oct 12, 2024
ab820e1
add more meaningful info to failed galaxy html
LonnekeScheffer Oct 12, 2024
e3282aa
view log file and exception in failed galaxy html
LonnekeScheffer Oct 12, 2024
700842b
user-friendly Galaxy error message
LonnekeScheffer Oct 12, 2024
9aa8cd4
simple YAML builder for Ligo galaxy tool
LonnekeScheffer Oct 12, 2024
4d25db3
update naming for better display in galaxy
LonnekeScheffer Oct 13, 2024
225105e
update default params build_ligo_yaml.py
LonnekeScheffer Oct 13, 2024
671e270
updated galaxy sim tool to export in immuneML format
LonnekeScheffer Oct 13, 2024
43778dc
user-friendly Galaxy error
LonnekeScheffer Oct 13, 2024
ef01d3c
do not require repertoire size if sequence dataset is used
LonnekeScheffer Oct 13, 2024
18dbe84
bugfix galaxy html for failed runs
LonnekeScheffer Oct 16, 2024
ab4c706
minor changes
LonnekeScheffer Oct 16, 2024
37aa2e2
import & export labels for Receptor/Sequence dataset
LonnekeScheffer Oct 16, 2024
c6e9ca7
version bump; update element datasets
LonnekeScheffer Oct 16, 2024
a561c9a
start fixing other tests
pavlovicmilena Oct 17, 2024
0d8b5c0
start fixing other tests
pavlovicmilena Oct 17, 2024
c1e1465
fix conversion to seq objs
pavlovicmilena Oct 17, 2024
40f2d56
fix gen models
pavlovicmilena Oct 18, 2024
ac9d5b1
fix gen models
pavlovicmilena Oct 18, 2024
b649d3a
Merge branch 'master' into nar
LonnekeScheffer Oct 18, 2024
9adaa5d
fix galaxy dataset gen overview tool
pavlovicmilena Oct 21, 2024
7e503e4
fix galaxy tools
pavlovicmilena Oct 21, 2024
2757e64
fix ligo for repertoires
pavlovicmilena Oct 21, 2024
9cf67d2
fix ligo for sequences; update ligo html
pavlovicmilena Oct 22, 2024
b8e1a83
fix some integration tests
pavlovicmilena Oct 22, 2024
f71f8cf
fix abundance encoding
pavlovicmilena Oct 23, 2024
5f416dc
fix preproc, start fixing reports
pavlovicmilena Oct 23, 2024
c7c4b3d
fix seq len dist report
pavlovicmilena Oct 25, 2024
0f040c5
fix some reports
pavlovicmilena Oct 25, 2024
0814f28
fix some reports
pavlovicmilena Oct 25, 2024
f7c47a0
fix some reports
pavlovicmilena Oct 25, 2024
a6ea1b1
fix some reports
pavlovicmilena Oct 25, 2024
9f8a940
add region type to aa dist plot
pavlovicmilena Oct 25, 2024
9f61d35
fix quickstart
pavlovicmilena Oct 26, 2024
1b6e21d
fix tests
pavlovicmilena Oct 26, 2024
d37a740
fix seq ab rep builder
pavlovicmilena Oct 26, 2024
a151823
fix tests
pavlovicmilena Oct 26, 2024
8e2e238
fix report tests
pavlovicmilena Oct 28, 2024
ad5d2f7
fix significant kmer/features tests
pavlovicmilena Oct 28, 2024
1d7ea9f
add region and sequence type info to encodings
pavlovicmilena Oct 28, 2024
e83f311
add region and sequence type info to encodings
pavlovicmilena Oct 28, 2024
4e19fbf
fix one hot encoding
pavlovicmilena Oct 28, 2024
8a7c705
fix aa freq report
pavlovicmilena Oct 29, 2024
b835406
fix ids, add pt to the dataset
pavlovicmilena Oct 29, 2024
57bf1ce
Merge remote-tracking branch 'origin/nar' into nar_io_refactor
pavlovicmilena Oct 29, 2024
a03e028
Added clustering galaxy support (build yaml for clustering)
LonnekeScheffer Oct 29, 2024
84d38e2
update argument name
LonnekeScheffer Oct 29, 2024
e363585
update argument name
LonnekeScheffer Oct 29, 2024
190e89b
add reports list to yaml
LonnekeScheffer Oct 29, 2024
e20be46
handle case with no supplied labels for clustering galaxy yaml
LonnekeScheffer Oct 29, 2024
41c4840
fix clustering case without labels
LonnekeScheffer Oct 29, 2024
f9b9f07
bugfix: clustering with no label
LonnekeScheffer Oct 29, 2024
6944fcf
merge nar and nar io refactor
pavlovicmilena Oct 30, 2024
7e617d7
fix multiprocessing for CompAIRR abundance encoder + AIRRSequenceSet …
pavlovicmilena Oct 31, 2024
0eebc31
add default params to ProbabilisticBinaryClassifier
pavlovicmilena Oct 31, 2024
00f4607
fix some tests
pavlovicmilena Oct 31, 2024
ec246f6
Merge branch 'nar' into nar_io_refactor
LonnekeScheffer Oct 31, 2024
6cd0571
updates to galaxy tools (use AIRRExporter instead of ImmuneMLExporter…
LonnekeScheffer Oct 31, 2024
44fb06c
remove old files
pavlovicmilena Nov 1, 2024
3156446
Merge remote-tracking branch 'origin/nar_io_refactor' into nar_io_ref…
pavlovicmilena Nov 1, 2024
41526a1
add init.py
pavlovicmilena Nov 1, 2024
4d947a6
fix installation stuff
pavlovicmilena Nov 1, 2024
bddd8d9
fix installation stuff
pavlovicmilena Nov 1, 2024
3b5a49f
fix installation stuff
pavlovicmilena Nov 1, 2024
07a3bc5
fix installation stuff
pavlovicmilena Nov 1, 2024
26c44e9
update requirements.txt
pavlovicmilena Nov 1, 2024
497c4df
fix label parsing in import
pavlovicmilena Nov 1, 2024
7f2b9f6
fix label parsing in import; update docs
pavlovicmilena Nov 1, 2024
ec6453b
fix VAE precision error
pavlovicmilena Nov 1, 2024
3d449b9
minor fixes for clustering and gen models
pavlovicmilena Nov 4, 2024
e5b5e7a
minor galaxy train gen model updates
pavlovicmilena Nov 4, 2024
ffe9619
limit scipy version because it doesn't work with gensim
pavlovicmilena Nov 4, 2024
35c3afa
fix css for index.html
pavlovicmilena Nov 4, 2024
c672794
add to airr import test: test for label import sequence/receptor dataset
LonnekeScheffer Nov 4, 2024
74e2c9a
add to airr import test: test for label import sequence/receptor dataset
LonnekeScheffer Nov 4, 2024
36bafe2
remove metadata_column_mapping from DatasetImportParams.py
LonnekeScheffer Nov 4, 2024
8646733
galaxy test print statement
LonnekeScheffer Nov 4, 2024
1b04714
do not add dataset_ prefix to .yaml filename (messes up galaxy)
LonnekeScheffer Nov 4, 2024
7e422cc
fixes for galaxy datasets:
LonnekeScheffer Nov 5, 2024
bcfb482
fix for dataset simulation sequences: dataset name, added test
LonnekeScheffer Nov 5, 2024
14f0c0c
- Added postprocessing for galaxy datasets, removing file paths
LonnekeScheffer Nov 5, 2024
68300ae
- standardized writing of dataset.yaml file
LonnekeScheffer Nov 5, 2024
30a4e5b
version bump for galaxy testing
LonnekeScheffer Nov 5, 2024
d082c5c
fix dataset io for galaxy
pavlovicmilena Nov 5, 2024
e95f63d
update version
pavlovicmilena Nov 5, 2024
b96fbe1
label update:
LonnekeScheffer Nov 5, 2024
c1dc4ec
Merge remote-tracking branch 'origin/nar' into nar
LonnekeScheffer Nov 5, 2024
432aa06
attempt start fixing data import for generative models
LonnekeScheffer Nov 5, 2024
8124868
minor bugfix SequenceCountDistribution.py
LonnekeScheffer Nov 6, 2024
c57a03c
minor fix: improved error message for failed reports
LonnekeScheffer Nov 7, 2024
0e0abc6
minor: add random dup. counts for random generated data
LonnekeScheffer Nov 7, 2024
8bf165b
minor: improved display SequenceCountDistribution (x axis sorted)
LonnekeScheffer Nov 7, 2024
297480b
improved filter_illegal_receptors, instead of removing any without co…
LonnekeScheffer Nov 8, 2024
7a7a6ab
improved receptor filter for receptors with >2 chains (first check ba…
LonnekeScheffer Nov 8, 2024
84765a4
added missing default param region_type for filters
LonnekeScheffer Nov 8, 2024
dbc2dfa
pass number_of_processes to preprocessing in MLProcess & Expl analysis
LonnekeScheffer Nov 8, 2024
42aea52
update error message when file not found on import
pavlovicmilena Nov 12, 2024
5782589
Merge remote-tracking branch 'origin/nar' into nar
pavlovicmilena Nov 12, 2024
600da81
update docs; fix vdjdb test
pavlovicmilena Nov 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Useful links:


We recommend installing immuneML inside a virtual environment.
immuneML uses Python 3.8 or later. If using immuneML simulation, Python 3.11 is recommended.
immuneML uses Python 3.8 or later. If using immuneML simulation, Python 3.11 or later is recommended.

immuneML can be [installed directly using a package manager](<https://docs.immuneml.uio.no/latest/installation/install_with_package_manager.html#>) such as pip or conda,
or [set up via docker](<https://docs.immuneml.uio.no/latest/installation/installation_docker.html>).
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
50 changes: 29 additions & 21 deletions docs/source/installation/install_with_package_manager.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,11 @@ Install immuneML with pip

0. To install immuneML with pip, make sure to have Python version 3.7 or later installed.

1. Create a virtual environment where immuneML will be installed. It is possible to install immuneML as a global package, but it is not recommended as there might be conflicting versions of different packages. For more details, see `the official documentation on creating virtual environments with Python <https://docs.python.org/3/library/venv.html>`_. To create an environment, run the following in the terminal (for Windows-specific commands, see the virtual environment documentation linked above):
1. Create a virtual environment where immuneML will be installed. It is possible to install immuneML as a global
package, but it is not recommended as there might be conflicting versions of different packages. For more details,
see `the official documentation on creating virtual environments with Python <https://docs.python.org/3/library/venv.html>`_.
To create an environment, run the following in the terminal (for Windows-specific commands, see the virtual
environment documentation linked above):

.. code-block:: console

Expand All @@ -29,7 +33,10 @@ Install immuneML with pip

source ./immuneml_venv/bin/activate

Note: when creating a python virtual environment, it will automatically use the same Python version as the environment it was created in. To ensure that the preferred Python version (3.8) is used, it is possible to instead make a conda environment (see :ref:`Install immuneML with conda` steps 0-3) and proceed to install immuneML with pip inside the conda environment.
Note: when creating a python virtual environment, it will automatically use the same Python version as the environment
it was created in. To ensure that the preferred Python version (3.8) is used, it is possible to instead make a conda
environment (see :ref:`Install immuneML with conda` steps 0-3) and proceed to install immuneML with pip inside the
conda environment.


3. If not already up-to-date, update pip:
Expand All @@ -38,13 +45,8 @@ Note: when creating a python virtual environment, it will automatically use the

python3 -m pip install --upgrade pip

4. If not already installed, install the wheel package. If it is not installed, the installation of some of the dependencies might default to legacy 'setup.py install'.

.. code-block:: console

pip install wheel

5. To install `immuneML from PyPI <https://pypi.org/project/immuneML/>`_ in this virtual environment, run the following:
4. To install `immuneML from PyPI <https://pypi.org/project/immuneML/>`_ in this virtual environment, run the following:

.. code-block:: console

Expand All @@ -64,12 +66,12 @@ Install immuneML with conda
mkdir immuneML/
cd immuneML/

2. Create a virtual environment using conda. immuneML has been tested extensively with Python versions 3.7, 3.8 and 3.11.
To create a conda virtual environment with Python version 3.8, use:
2. Create a virtual environment using conda. immuneML has been tested extensively with Python version 3.11.
To create a conda virtual environment with Python version 3.11, use:

.. code-block:: console

conda create --prefix immuneml_env/ python=3.8
conda create --prefix immuneml_env/ python=3.11

3. Activate the created environment:

Expand Down Expand Up @@ -118,27 +120,33 @@ To install the DeepRC dependencies, run:
See also this question under 'Troubleshooting': :ref:`I get an error when installing PyTorch (could not find a version that satisfies the requirement torch)`


Keras-based sequence CNN
Deep learning methods
************************

In order to use the :ref:`KerasSequenceCNN`, optional dependencies :code:`keras` and :code:`tensorflow` need to be installed.
By default, version 2.11.0 of both dependencies are used.
Other versions may work as well, as long as the used versions of :code:`keras` and :code:`tensorflow` are compatible with eachother.

To install the default versions of these packages, you can include the optional extra :code:`KerasSequenceCNN`:
In order to use any of the supported deep learning models (KerasSequenceCNN or others), install DL optional dependencies:

.. code-block:: console

pip install immuneML[KerasSequenceCNN]
pip install immuneML[DL]

Or install the dependencies manually using the :download:`requirements_KerasSequenceCNN.txt <https://raw.githubusercontent.com/uio-bmi/immuneML/master/requirements_KerasSequenceCNN.txt>` file:
Fisher's exact test
**********************

For using ProbabilisticBinaryClassifier or any of the abundance encoders (following Emerson et al. 2017 publication),
please install 'fisher' optional dependencies:

.. code-block:: console

pip install -r requirements_KerasSequenceCNN.txt
pip install immuneML[fisher]

Full immuneML installation
******************************

To install all optional dependencies and have access to the full set of immuneML features, use the following installation command:

.. code-block:: console

The :ref:`KerasSequenceCNN` uses CPU, it does *not* rely on GPU.
pip install immuneML[all]

CompAIRR
********
Expand Down
6 changes: 6 additions & 0 deletions docs/source/installation/installation_docker.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ To exit the Docker container, use the following command:

exit

.. note:: Available data

Please note that the Docker container only has access to the data that was explicitly mounted to the container. This
means that if you followed the example above, immuneML running the Docker container will only have access to files in
and under the current working directory and will see it under /data path.

Using the Docker container for longer immuneML runs
----------------------------------------------------
Ï
Expand Down
232 changes: 20 additions & 212 deletions immuneML/IO/dataset_export/AIRRExporter.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import logging
import math
import shutil
from dataclasses import fields
from enum import Enum
from multiprocessing import Pool
Expand All @@ -8,18 +9,11 @@

import airr
import pandas as pd
from olga.utils import nt2aa

from immuneML.IO.dataset_export.DataExporter import DataExporter
from immuneML.data_model.dataset import Dataset
from immuneML.data_model.dataset.ReceptorDataset import ReceptorDataset
from immuneML.data_model.dataset.RepertoireDataset import RepertoireDataset
from immuneML.data_model.receptor.Receptor import Receptor
from immuneML.data_model.receptor.RegionType import RegionType
from immuneML.data_model.receptor.receptor_sequence.Chain import Chain
from immuneML.data_model.receptor.receptor_sequence.ReceptorSequence import ReceptorSequence
from immuneML.data_model.receptor.receptor_sequence.SequenceFrameType import SequenceFrameType
from immuneML.data_model.repertoire.Repertoire import Repertoire
from immuneML.data_model.datasets.Dataset import Dataset
from immuneML.data_model.datasets.ElementDataset import ElementDataset
from immuneML.data_model.datasets.RepertoireDataset import RepertoireDataset
from immuneML.environment.Constants import Constants
from immuneML.util.NumpyHelper import NumpyHelper
from immuneML.util.PathBuilder import PathBuilder
Expand All @@ -41,211 +35,25 @@ class AIRRExporter(DataExporter):
def export(dataset: Dataset, path: Path, number_of_processes: int = 1, omit_columns: list = None):
PathBuilder.build(path)

if isinstance(dataset, RepertoireDataset):
repertoire_folder = "repertoires/"
repertoire_path = PathBuilder.build(path / repertoire_folder)
try:

with Pool(processes=number_of_processes) as pool:
arguments = [(repertoire, repertoire_path, dataset.labels, omit_columns)
for repertoire in dataset.repertoires]
pool.starmap(AIRRExporter.export_repertoire, arguments)
if isinstance(dataset, RepertoireDataset):
repertoire_folder = "repertoires/"
repertoire_path = PathBuilder.build(path / repertoire_folder)

AIRRExporter.export_updated_metadata(dataset, path, repertoire_folder)
else:
for repertoire in dataset.repertoires:
shutil.copyfile(repertoire.data_filename, repertoire_path / repertoire.data_filename.name)
shutil.copyfile(repertoire.metadata_filename, repertoire_path / repertoire.metadata_filename.name)

index = 1
file_count = math.ceil(dataset.get_example_count() / dataset.file_size)
shutil.copyfile(dataset.metadata_file, path / dataset.metadata_file.name)
if dataset.dataset_file and dataset.dataset_file.is_file():
shutil.copyfile(dataset.dataset_file, path / dataset.dataset_file.name)

for batch in dataset.get_batch():
filename = path / f"batch{''.join(['0' for i in range(1, len(str(file_count)) - len(str(index)) + 1)])}{index}.tsv"
elif isinstance(dataset, ElementDataset):
shutil.copyfile(dataset.filename, path / dataset.filename.name)
shutil.copyfile(dataset.dataset_file, path / dataset.dataset_file.name)

if isinstance(dataset, ReceptorDataset):
df = AIRRExporter._receptors_to_dataframe(batch)
else:
df = AIRRExporter._sequences_to_dataframe(batch)
except shutil.SameFileError as e:
logging.warning(f"AIRRExporter: target and input path are the same. Skipping the copy operation...")

df = AIRRExporter._postprocess_dataframe(df, dataset.labels, omit_columns)
airr.dump_rearrangement(df, str(filename))

index += 1

@staticmethod
def export_repertoire(repertoire: Repertoire, repertoire_path: Path, dataset_labels: dict, omit_columns: list = None):
df = AIRRExporter._repertoire_to_dataframe(repertoire)
df = AIRRExporter._postprocess_dataframe(df, dataset_labels, omit_columns)
output_file = repertoire_path / f"{repertoire.data_filename.stem if 'subject_id' not in repertoire.metadata else repertoire.metadata['subject_id']}.tsv"
airr.dump_rearrangement(df, str(output_file))

@staticmethod
def get_sequence_field(region_type):
if region_type == RegionType.IMGT_CDR3:
return "cdr3"
elif region_type == RegionType.IMGT_JUNCTION:
return "junction"
else:
return "sequence"

@staticmethod
def get_sequence_aa_field(region_type):
return f"{AIRRExporter.get_sequence_field(region_type)}_aa"

@staticmethod
def export_updated_metadata(dataset: RepertoireDataset, result_path: Path, repertoire_folder: str):
df = pd.read_csv(dataset.metadata_file, comment=Constants.COMMENT_SIGN)
identifiers = df["identifier"].values.tolist() if "identifier" in df.columns else dataset.get_example_ids()
df["filename"] = [f"{repertoire.data_filename.stem if 'subject_id' not in repertoire.metadata else repertoire.metadata['subject_id']}.tsv"
for repertoire in dataset.get_data()]
df['identifier'] = identifiers
df.to_csv(result_path / "metadata.csv", index=False)

@staticmethod
def _repertoire_to_dataframe(repertoire: Repertoire):
rep_data = repertoire.load_bnp_data()
df = pd.DataFrame({field.name: getattr(rep_data, field.name).tolist() for field in fields(rep_data)})

region_type = repertoire.get_region_type()

# rename mandatory fields for airr-compliance
mapper = {"chain": "locus", "sequence": AIRRExporter.get_sequence_field(region_type),
"sequence_aa": AIRRExporter.get_sequence_aa_field(region_type)}

df = df.rename(mapper=mapper, axis="columns")
df.drop(columns=['region_type'], inplace=True)

return df

@staticmethod
def add_full_length_seq(df, species, unique_chains):
if unique_chains is not None and len(unique_chains) <= 2 and all(chain in [Chain.ALPHA.value, Chain.BETA.value] for chain in unique_chains):
try:
from Stitchr import stitchr as st
from Stitchr import stitchrfunctions as fxn

tcr_dat, functionality, partial = {}, {}, {}

for chain in unique_chains:
tcr_dat[chain], functionality[chain], partial[chain] = fxn.get_imgt_data(chain, st.gene_types, species.upper())

codons = fxn.get_optimal_codons('', species)

df['full_sequence'] = df.apply(lambda row: stitch_wrapper(row, st, fxn, species, tcr_dat, functionality, partial, codons), axis=1)

df['full_sequence_aa'] = df.apply(lambda row: nt2aa(row['full_sequence']), axis=1)

except Exception as e:
logging.warning(f"An error occurred while exporting full length sequence. Only CDR3/JUNCTION region "
f"is exported instead.\nFull error: {e}")

@staticmethod
def _receptors_to_dataframe(receptors: List[Receptor]):
sequences = [(receptor.get_chain(receptor.get_chains()[0]), receptor.get_chain(receptor.get_chains()[1])) for receptor in receptors]
sequences = [item for sublist in sequences for item in sublist]
receptor_ids = [(receptor.identifier, receptor.identifier) for receptor in receptors]
receptor_ids = [item for sublist in receptor_ids for item in sublist]

df = AIRRExporter._sequences_to_dataframe(sequences)
df["cell_id"] = receptor_ids
return df

@staticmethod
def _get_sequence_list_region_type(sequences: List[ReceptorSequence]):
region_types = set([sequence.get_attribute("region_type") for sequence in sequences])

assert len(region_types) == 1, f"AIRRExporter: expected one region_type, found: {region_types}"

return RegionType(region_types.pop())

@staticmethod
def _sequences_to_dataframe(sequences: List[ReceptorSequence]):
region_type = AIRRExporter._get_sequence_list_region_type(sequences)
sequence_field = AIRRExporter.get_sequence_field(region_type)
sequence_aa_field = AIRRExporter.get_sequence_aa_field(region_type)

main_data_dict = {"sequence_id": [], sequence_field: [], sequence_aa_field: []}
attributes_dict = {"chain": [], "v_call": [], "j_call": [], "duplicate_count": [], "cell_id": [], "frame_type": []}

for i, sequence in enumerate(sequences):
main_data_dict["sequence_id"].append(sequence.sequence_id)
main_data_dict[sequence_field].append(sequence.sequence)
main_data_dict[sequence_aa_field].append(sequence.sequence_aa)

# add custom params of this receptor sequence to attributes dict
if sequence.metadata is not None and sequence.metadata.custom_params is not None:
for custom_param in sequence.metadata.custom_params:
if custom_param not in attributes_dict:
attributes_dict[custom_param] = ['' for i in range(i)]

for attribute in attributes_dict.keys():
try:
attr_value = sequence.get_attribute(attribute)
if isinstance(attr_value, Enum):
attr_value = attr_value.value
attributes_dict[attribute].append(attr_value)
except KeyError:
attributes_dict[attribute].append('')

df = pd.DataFrame({**attributes_dict, **main_data_dict})

df.rename(columns={"chain": "locus"}, inplace=True)

return df

@staticmethod
def update_gene_columns(df, allele_name, gene_name):
for index, row in df.iterrows():
for gene in ['v', 'j']:
if NumpyHelper.is_nan_or_empty(row[f"{gene}_{allele_name}"]) and not NumpyHelper.is_nan_or_empty(row[f"{gene}_{gene_name}"]):
df.at[index, f"{gene}_{allele_name}"] = row[f"{gene}_{gene_name}"]

@staticmethod
def _postprocess_dataframe(df, dataset_labels: dict, omit_columns: list = None):
if "locus" in df.columns:
df["locus"] = [Chain.get_chain(chain).value if chain and Chain.get_chain(chain) else '' for chain in df["locus"]]
else:
df['locus'] = df.apply(lambda row: Chain.get_chain(row['v_call'][:3]).value, axis=1)

if "frame_type" in df.columns:
AIRRExporter._enums_to_strings(df, "frame_type")

df["productive"] = df["frame_type"] == SequenceFrameType.IN.value
df.loc[df["frame_type"].isnull(), "productive"] = ""
df.loc[df["frame_type"] == "", "productive"] = ""
df.loc[df["frame_type"] == SequenceFrameType.UNDEFINED.value, "productive"] = ""

df["vj_in_frame"] = df["productive"]

df["stop_codon"] = df["frame_type"] == SequenceFrameType.STOP.value
df.loc[df["frame_type"].isnull(), "stop_codon"] = ''

df.drop(columns=["frame_type"], inplace=True)

if "region_type" in df.columns:
df.drop(columns=["region_type"], inplace=True)

if omit_columns is not None:
df.drop(columns=omit_columns, inplace=True)

AIRRExporter.add_full_length_seq(df, dataset_labels.get('species', None) if dataset_labels else None, list(set(df['locus'].values.tolist())))

return df

@staticmethod
def _enums_to_strings(df, field):
df.loc[:, field] = [field_value.value if isinstance(field_value, Enum) else field_value for field_value in df.loc[:, field]]


def stitch_wrapper(row, st, fxn, species, tcr_dat, functionality, partial, codons):
full_sequence = ""

try:
full_sequence = st.stitch({'v': row['v_call'], 'j': row['j_call'], 'cdr3': row['junction_aa'],
'skip_c_checks': False, '5_prime_seq': '', '3_prime_seq': '', 'name': '',
'c': fxn.autofill_input({'c': None, 'species': species.upper(), 'j': row['j_call'],
'l': row['v_call']}, row['locus'])['c'],
'species': species.upper(), 'l': row['v_call']},
tcr_dat[row['locus']], functionality[row['locus']], partial[row['locus']], codons, 3, '')[1]

except Exception as e:
logging.warning(f"An error occurred while constructing full sequence from row: \n{row}. Error log: \n{e}")

return full_sequence
# TODO: add here export of full sequence if possible
3 changes: 1 addition & 2 deletions immuneML/IO/dataset_export/DataExporter.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@
import abc
from pathlib import Path

from immuneML.data_model.dataset.Dataset import Dataset
from immuneML.data_model.receptor.RegionType import RegionType
from immuneML.data_model.datasets.Dataset import Dataset


class DataExporter(metaclass=abc.ABCMeta):
Expand Down
Loading
Loading