Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SCimilarity #3

Merged
merged 13 commits into from
Oct 22, 2024
11 changes: 5 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,7 @@ flowchart TB

A subset of the common dataset.

Example file:
`resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad`
Example file: `resources_test/common/cxg_immune_cell_atlas/dataset.h5ad`

Format:

Expand Down Expand Up @@ -158,7 +157,7 @@ Arguments:
Unintegrated AnnData HDF5 file.

Example file:
`resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/dataset.h5ad`
`resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad`

Format:

Expand Down Expand Up @@ -202,7 +201,7 @@ Data structure:
Uncensored dataset containing the true labels.

Example file:
`resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad`
`resources_test/task_batch_integration/cxg_immune_cell_atlas/solution.h5ad`

Format:

Expand Down Expand Up @@ -317,7 +316,7 @@ Arguments:
An integrated AnnData dataset.

Example file:
`resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/integrated.h5ad`
`resources_test/task_batch_integration/cxg_immune_cell_atlas/integrated.h5ad`

Description:

Expand Down Expand Up @@ -362,7 +361,7 @@ Data structure:
An integrated AnnData dataset with additional outputs.

Example file:
`resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/integrated_full.h5ad`
`resources_test/task_batch_integration/cxg_immune_cell_atlas/integrated_full.h5ad`

Description:

Expand Down
12 changes: 6 additions & 6 deletions _viash.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,21 +31,21 @@ description: |

references:
doi:
# Luecken, M.D., Büttner, M., Chaichoompu, K. et al.
# Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2022).
# Luecken, M.D., Büttner, M., Chaichoompu, K. et al.
# Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2022).
- 10.1038/s41592-021-01336-8

info:
image: thumbnail.svg
test_resources:
- type: s3
path: s3://openproblems-data/resources_test/common/cxg_mouse_pancreas_atlas/
dest: resources_test/common/cxg_mouse_pancreas_atlas
path: s3://openproblems-data/resources_test/common/cxg_immune_cell_atlas/
dest: resources_test/common/cxg_immune_cell_atlas
- type: s3
path: s3://openproblems-data/resources_test/task_batch_integration/
dest: resources_test/task_batch_integration

authors:
authors:
- name: Michaela Mueller
roles: [ maintainer, author ]
info:
Expand Down
28 changes: 14 additions & 14 deletions scripts/create_resources/test_resources.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,36 +15,36 @@ mkdir -p $DATASET_DIR

# process dataset
viash run src/data_processors/process_dataset/config.vsh.yaml -- \
--input "$RAW_DATA/cxg_mouse_pancreas_atlas/dataset.h5ad" \
--output_dataset "$DATASET_DIR/cxg_mouse_pancreas_atlas/dataset.h5ad" \
--output_solution "$DATASET_DIR/cxg_mouse_pancreas_atlas/solution.h5ad"
--input "$RAW_DATA/cxg_immune_cell_atlas/dataset.h5ad" \
--output_dataset "$DATASET_DIR/cxg_immune_cell_atlas/dataset.h5ad" \
--output_solution "$DATASET_DIR/cxg_immune_cell_atlas/solution.h5ad"

# run one method
viash run src/methods/combat/config.vsh.yaml -- \
--input $DATASET_DIR/cxg_mouse_pancreas_atlas/dataset.h5ad \
--output $DATASET_DIR/cxg_mouse_pancreas_atlas/integrated.h5ad
--input $DATASET_DIR/cxg_immune_cell_atlas/dataset.h5ad \
--output $DATASET_DIR/cxg_immune_cell_atlas/integrated.h5ad

# run transformer
viash run src/data_processors/transform/config.vsh.yaml -- \
--input_integrated $DATASET_DIR/cxg_mouse_pancreas_atlas/integrated.h5ad \
--input_dataset $DATASET_DIR/cxg_mouse_pancreas_atlas/dataset.h5ad \
--input_integrated $DATASET_DIR/cxg_immune_cell_atlas/integrated.h5ad \
--input_dataset $DATASET_DIR/cxg_immune_cell_atlas/dataset.h5ad \
--expected_method_types feature \
--output $DATASET_DIR/cxg_mouse_pancreas_atlas/integrated_full.h5ad
--output $DATASET_DIR/cxg_immune_cell_atlas/integrated_full.h5ad

# run one metric
viash run src/metrics/graph_connectivity/config.vsh.yaml -- \
--input_integrated $DATASET_DIR/cxg_mouse_pancreas_atlas/integrated_full.h5ad \
--input_solution $DATASET_DIR/cxg_mouse_pancreas_atlas/solution.h5ad \
--output $DATASET_DIR/cxg_mouse_pancreas_atlas/score.h5ad
--input_integrated $DATASET_DIR/cxg_immune_cell_atlas/integrated_full.h5ad \
--input_solution $DATASET_DIR/cxg_immune_cell_atlas/solution.h5ad \
--output $DATASET_DIR/cxg_immune_cell_atlas/score.h5ad

# write the state file
cat > $DATASET_DIR/state.yaml << HERE
id: cxg_mouse_pancreas_atlas
cat > $DATASET_DIR/cxg_immune_cell_atlas/state.yaml << HERE
id: cxg_immune_cell_atlas
output_dataset: !file dataset.h5ad
output_solution: !file solution.h5ad
output_integrated: !file integrated.h5ad
output_integrated_full: !file integrated_full.h5ad
output_score: !file score.h5ad
output_score: !file score_mod1.h5ad
HERE

# only run this if you have access to the openproblems-data bucket
Expand Down
20 changes: 20 additions & 0 deletions src/api/base_method.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
namespace: methods
info:
type: method
type_info:
label: Method
summary: A method for the batch integration task.
description: |
A batch integration method which integrates multiple datasets.
arguments:
- name: --input
__merge__: file_dataset.yaml
direction: input
required: true
- name: --output
__merge__: file_integrated.yaml
direction: output
required: true
test_resources:
- type: python_script
path: /common/component_tests/check_config.py
4 changes: 2 additions & 2 deletions src/api/comp_control_method.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,5 @@ test_resources:
path: /common/component_tests/check_config.py
- type: python_script
path: /common/component_tests/run_and_check_output.py
- path: /resources_test/task_batch_integration/cxg_mouse_pancreas_atlas
dest: resources_test/task_batch_integration/cxg_mouse_pancreas_atlas
- path: /resources_test/task_batch_integration/cxg_immune_cell_atlas
dest: resources_test/task_batch_integration/cxg_immune_cell_atlas
24 changes: 3 additions & 21 deletions src/api/comp_method.yaml
Original file line number Diff line number Diff line change
@@ -1,24 +1,6 @@
namespace: methods
info:
type: method
type_info:
label: Method
summary: A method for the batch integration task.
description: |
A batch integration method which integrates multiple datasets.
arguments:
- name: --input
__merge__: file_dataset.yaml
direction: input
required: true
- name: --output
__merge__: file_integrated.yaml
direction: output
required: true
__merge__: base_method.yaml
test_resources:
- type: python_script
path: /common/component_tests/check_config.py
- type: python_script
path: /common/component_tests/run_and_check_output.py
- path: /resources_test/task_batch_integration/cxg_mouse_pancreas_atlas
dest: resources_test/task_batch_integration/cxg_mouse_pancreas_atlas
- path: /resources_test/task_batch_integration/cxg_immune_cell_atlas
dest: resources_test/task_batch_integration/cxg_immune_cell_atlas
4 changes: 2 additions & 2 deletions src/api/comp_metric.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,5 @@ test_resources:
path: /common/component_tests/check_config.py
- type: python_script
path: /common/component_tests/run_and_check_output.py
- path: /resources_test/task_batch_integration/cxg_mouse_pancreas_atlas
dest: resources_test/task_batch_integration/cxg_mouse_pancreas_atlas
- path: /resources_test/task_batch_integration/cxg_immune_cell_atlas
dest: resources_test/task_batch_integration/cxg_immune_cell_atlas
6 changes: 3 additions & 3 deletions src/api/comp_process_dataset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ arguments:
default: 2000
required: false
test_resources:
- path: /resources_test/common/cxg_mouse_pancreas_atlas/
dest: resources_test/common/cxg_mouse_pancreas_atlas/
- path: /resources_test/common/cxg_immune_cell_atlas/
dest: resources_test/common/cxg_immune_cell_atlas/
- type: python_script
path: /common/component_tests/run_and_check_output.py
path: /common/component_tests/run_and_check_output.py
8 changes: 4 additions & 4 deletions src/api/comp_transformer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ info:
summary: Check the output and transform to create additional output types
description: |
This component will:

- Assert whether the input dataset and integrated dataset have the same shape.
- Reorder the integrated dataset to match the input dataset if needed.
- Transform the corrected feature output to an embedding.
Expand All @@ -26,7 +26,7 @@ arguments:
required: true
multiple: true
description: |
The expected output types of the batch integration method.
The expected output types of the batch integration method.
choices: [ feature, embedding, graph ]
- name: --output
__merge__: file_integrated_full.yaml
Expand All @@ -35,5 +35,5 @@ arguments:
test_resources:
- type: python_script
path: /common/component_tests/run_and_check_output.py
- path: /resources_test/task_batch_integration/cxg_mouse_pancreas_atlas
dest: resources_test/task_batch_integration/cxg_mouse_pancreas_atlas
- path: /resources_test/task_batch_integration/cxg_immune_cell_atlas
dest: resources_test/task_batch_integration/cxg_immune_cell_atlas
2 changes: 1 addition & 1 deletion src/api/file_common_dataset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# `src/datasets/api/file_common_dataset.yaml`. However, some fields
# such as obs.cell_type and obs.batch are now required
type: file
example: "resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad"
example: "resources_test/common/cxg_immune_cell_atlas/dataset.h5ad"
label: "Common Dataset"
summary: A subset of the common dataset.
info:
Expand Down
2 changes: 1 addition & 1 deletion src/api/file_dataset.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
type: file
example: "resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/dataset.h5ad"
example: "resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad"
label: "Dataset"
summary: Unintegrated AnnData HDF5 file.
info:
Expand Down
2 changes: 1 addition & 1 deletion src/api/file_integrated.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
type: file
example: "resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/integrated.h5ad"
example: "resources_test/task_batch_integration/cxg_immune_cell_atlas/integrated.h5ad"
label: Integration
summary: An integrated AnnData dataset.
description: |
Expand Down
4 changes: 2 additions & 2 deletions src/api/file_integrated_full.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
type: file
example: "resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/integrated_full.h5ad"
example: "resources_test/task_batch_integration/cxg_immune_cell_atlas/integrated_full.h5ad"
label: Transformed integration
summary: An integrated AnnData dataset with additional outputs.
description: |
Expand All @@ -8,7 +8,7 @@ description: |
- Feature: the corrected_counts layer
- Embedding: the X_emb obsm
- Graph: the connectivities and distances obsp

The Graph should always be present, but the Feature and Embedding are optional.
info:
format:
Expand Down
2 changes: 1 addition & 1 deletion src/api/file_solution.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
type: file
example: "resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
example: "resources_test/task_batch_integration/cxg_immune_cell_atlas/solution.h5ad"
label: "Solution"
summary: Uncensored dataset containing the true labels.
info:
Expand Down
6 changes: 3 additions & 3 deletions src/control_methods/embed_cell_types/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

## VIASH START
par = {
'input_dataset': 'resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/dataset.h5ad',
'input_solution': 'resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad',
'input_dataset': 'resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad',
'input_solution': 'resources_test/task_batch_integration/cxg_immune_cell_atlas/solution.h5ad',
'output': 'output.h5ad',
}
meta = {
meta = {
'functionality': 'foo',
'config': 'bar'
}
Expand Down
6 changes: 3 additions & 3 deletions src/control_methods/embed_cell_types_jittered/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
## VIASH START

par = {
'input_dataset': 'resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/dataset.h5ad',
'input_solution': 'resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad',
'input_dataset': 'resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad',
'input_solution': 'resources_test/task_batch_integration/cxg_immune_cell_atlas/solution.h5ad',
'output': 'output.h5ad',
'jitter': 0.01,
}

meta = {
meta = {
'functionality': 'foo',
'config': 'bar'
}
Expand Down
2 changes: 1 addition & 1 deletion src/control_methods/no_integration/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## VIASH START
par = {
'input_dataset': 'resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/dataset.h5ad',
'input_dataset': 'resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad',
'output': 'output.h5ad',
}
## VIASH END
Expand Down
6 changes: 3 additions & 3 deletions src/control_methods/no_integration_batch/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@
## VIASH START

par = {
'input_dataset': 'resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/dataset.h5ad',
'input_dataset': 'resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad',
'output': 'output.h5ad',
}

meta = {
meta = {
'functionality': 'foo',
'config': 'bar'
}
Expand Down Expand Up @@ -46,4 +46,4 @@

print("Store outputs", flush=True)
adata.uns['method_id'] = meta['name']
adata.write_h5ad(par['output'], compression='gzip')
adata.write_h5ad(par['output'], compression='gzip')
4 changes: 2 additions & 2 deletions src/control_methods/shuffle_integration/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@

## VIASH START
par = {
'input_dataset': 'resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/dataset.h5ad',
'input_dataset': 'resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad',
'output': 'output.h5ad',
}
meta = {
meta = {
"resources_dir": "src/tasks/batch_integration/control_methods/"
}
## VIASH END
Expand Down
4 changes: 2 additions & 2 deletions src/control_methods/shuffle_integration_by_batch/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@

## VIASH START
par = {
'input_dataset': 'resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/dataset.h5ad',
'input_dataset': 'resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad',
'output': 'output.h5ad',
}
meta = {
meta = {
"resources_dir": "src/tasks/batch_integration/control_methods/"
}
## VIASH END
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@

## VIASH START
par = {
'input_dataset': 'resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/dataset.h5ad',
'input_dataset': 'resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad',
'output': 'output.h5ad',
}
meta = {
meta = {
"resources_dir": "src/tasks/batch_integration/control_methods/"
}
## VIASH END
Expand Down
6 changes: 3 additions & 3 deletions src/data_processors/transform/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

## VIASH START
par = {
"input_integrated": "resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/integrated.h5ad",
"input_dataset": "resources_test/task_batch_integration/cxg_mouse_pancreas_atlas/dataset.h5ad",
"input_integrated": "resources_test/task_batch_integration/cxg_immune_cell_atlas/integrated.h5ad",
"input_dataset": "resources_test/task_batch_integration/cxg_immune_cell_atlas/dataset.h5ad",
"expected_method_types": ["feature"],
"ouput": "output.h5ad"
}
Expand All @@ -28,7 +28,7 @@

if "corrected_counts" in integrated.layers.keys():
assert integrated.shape[1] == dataset.shape[1], "Number of genes do not match"

if not integrated.var.index.equals(dataset.var.index):
assert integrated.var.index.sort_values().equals(dataset.var.index.sort_values()), "Gene names do not match"
print("Reordering genes", flush=True)
Expand Down
Loading