Skip to content

Latest commit

 

History

History
47 lines (29 loc) · 1.79 KB

README.md

File metadata and controls

47 lines (29 loc) · 1.79 KB

SODA

This project aims to automatically identify the Subtype-Oriented Disease Axes (SODA) given the features and multiple labels. It generate the projections of the given features (i.e., the disease axes) that optimally separate the pairwise comparison betweenthe given labels. The code is included in disease_axis.py.

Toy Data

We generate some toy data to demonstrate the results of provided the code provided. The toy data is generated via generate_data.py.

The toy data contains 2 files: data.csv and cluster.csv

data.csv contains the patient data. The file is separated with commas. The format of the file is provided as follows:

id,     feature_name1,  feature_name2,  feature_name3,     ... # header
XX0001,     -0.94,          -0.14,          -0.91,         ...
XX0002,     0.77,           0.306           0.86           ...

cluster.csv contains the patient data. The file is separated with commas. The labels is represeted using integers. "NA" represents that the data is not available. The format of this file is provided as follows:

id,         label_name1,    label_name2,    label_name3, ... # header
XX0001,         1,              2,              NA,
XX0002,         0,              2,              0,

The patients involved in both files are not necessarilly to be the same. The package will automatically match the data and the labels based on the patient id.

Usage

An example that anaylyzes the toy data with SODA code:

import disease_axis

#Generate the disease axes and save the figures
M = disease_axis.disease_axis("data.csv", "cluster.csv", savefig = True)

M.output_projection("projection.csv") # Output the projection matrix.
M.output_axes("axes.csv") # Output the disease axes.

Example

Some example plots generated by the code are included in Demo.ipynb.