Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
mariya committed Sep 19, 2024
1 parent d35ad7e commit af32295
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,21 @@ Run `python test_client.py` to run GEX and DNAm prediction on test datasets.
## Tests
Run `pytest`.

## Preprocessing GEX data
To prepare gene expression for prediction using ALLIUM, you will need a CSV file with raw gene transcript counts. The gene identifiers can be any recognizable format, such as HGNC symbols or Ensembl IDs.

| | Sample_1 | Sample_2 | ... |
| --------| -------- | -------- | --- |
| ETV6 | 10 | 10 | ... |
| SARS1 | 20 | 10 | ... |
| DOC2B | 5 | 10 | ... |

This file will need to undergo:
- gene identifier standardization
- batch identification and processing, if necessary
- normalization

TODO: Add example file to repository, and describe preprocessing script usage.

## Limitations
The models were trained using an older version of scikit-learn, due to some legacy dependency issues. This package, together with the Python version, should preferably be upgraded when retraining the model. Due to this, the current version of the prediction client does not work on Mac OS.

0 comments on commit af32295

Please sign in to comment.