diff --git a/README.md b/README.md index da53e0c..25d7750 100644 --- a/README.md +++ b/README.md @@ -6,17 +6,22 @@ [![CircleCI](https://circleci.com/gh/Habush/moses-service.svg?style=svg)](https://circleci.com/gh/Habush/mozi_snet_service) [![Coverage Status](https://coveralls.io/repos/github/Habush/mozi_snet_service/badge.svg?branch=master)](https://coveralls.io/github/Habush/mozi_snet_service?branch=master) [![BCH compliance](https://bettercodehub.com/edge/badge/Habush/mozi_snet_service?branch=master)](https://bettercodehub.com/) -The MOSES service for SingularityNET +### The MOSES service for SingularityNET -The purpose of this service is to use [MOSES](https://github.com/opencog/moses) for supervised classification of high dimensional data sets with many more features than samples, such as whole genome sequencing data or gene expression data. See the OpenCog [wiki page](https://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutionary_Search) about MOSES or this [Quick Guide](https://github.com/opencog/moses/blob/master/doc/moses/QuickGuide.pdf) for more detailed information. +The purpose of this service is to use [MOSES](https://github.com/opencog/moses) for supervised classification of high dimensional data sets with many more features than samples, such as whole genome sequencing data or gene expression data. See the OpenCog [wiki page](https://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutionary_Search) about MOSES or this [Quick Guide](https://github.com/opencog/moses/blob/master/doc/moses/QuickGuide.pdf) for more detailed information on MOSES. +The user supplies a csv file of sample data with samples/observations as rows and binary valued features (`1` for "true" and `0` for "false") as columns with binary sample labels (`1` for "case" and `0` for "control") in the first column, along with a yaml file of MOSES program options, cross-validation parameters, and score thresholds for filtering the evolved boolean models. -#### Running the Service +The service provides a URL link for downloading the set of output files, including copies of the input files, the MOSES log file, tables of output models from each cross-validation fold with their out-of-training-sample scores, a table of the filtered models with their scores on the complete input dataset and the scores for their majority-vote ensemble, and a table of feature counts from the ensemble model. + +You can find a detailed description of using the service [here](https://mozi-ai.github.io/moses-service/users_guide/moses-service.html). + +#### Building and Running the Service 1. Clone the project: - ``$ git clone --recursive https://github.com/Habush/moses-service.git`` + ``$ git clone --recursive https://github.com/MOZI-AI/moses-service.git`` 2. Go to the project folder to start the docker containers to run the gRPC server and its dependencies (redis, mongo, etc) @@ -77,7 +82,4 @@ cross_val_opts: target_feature: "case" ``` - -#### Calling the Service - -You can find details on how to call the service on the github page [here](https://mozi-ai.github.io/moses-service/users_guide/moses-service.html) +see [here](https://wiki.opencog.org/w/MOSES_man_page) for a complete description of MOSES options. diff --git a/docs/users_guide/moses-service.md b/docs/users_guide/moses-service.md index b984b16..cdfa5d7 100644 --- a/docs/users_guide/moses-service.md +++ b/docs/users_guide/moses-service.md @@ -5,22 +5,20 @@ # Mozi Moses Service - This service uses the OpenCog Meta-Optimising Semantic Evolutionary Search Algorithm [MOSES](https://github.com/opencog/moses) to generate boolean classification models of genomic or other high dimensional binary feature data sets using a multi-population evolutionary search strategy with a normalization procedure to simplify the evolving models. -The user passes the data file and a file with cross-validation parameters and MOSES configuration flags to the service and receives a URL for retreiving the analysis results. The compressed results file contains the original input file, a csv file with the classifier models and their test set scores and ensemble model scores on the complete data set, and a csv file with feature counts from the models in the ensemble, and the MOSES log file. +The user passes the data file and a file with cross-validation parameters, MOSES configuration options, and score thresholds to filter models (if the default "better than chance" threshold is not stringent enough) to the service. She receives a URL link to a page showing analysis progress and eventually a link for retreiving the analysis results. The compressed results file contains the original input files, csv files with the boolean classifier models from each cross-validation fold and their scores on the out-of-sample test set, the filtered models with their scores on the complete data set and their majority vote ensemble scores, and a csv file with feature counts from the models in the ensemble, and the MOSES log file. It is part of a set of SingularityNET demonstration bio AI agents adapted from [Mozi.AI](https://mozi.ai) suite of OpenCog based bioinformatics tools. -### Data - +### Input Data Data set with binary values, observations in rows, features in columns, observation labels assumed in first column unless specified with parameter flag. -For genomic data, variants are naturally represented with a "one" value when present in a sample. For diploid samples a "dominant" model can be used with a "one" for heterozygous or homozygous for the variant, a "recessive" model with a "one" for homozygous variant only, or by using two features for each variant, one for each possiblility. +For genomic data, variants are naturally represented with a **true/1** value when present in a sample. For diploid samples a "dominant" model can be used with a "one" for heterozygous or homozygous for the alternate variant, a "recessive" model with a **1** for alternate homozygous variant only, or by using two features for each variant, one for heterozygous and one for alternate homozygous. -For numerically valued features such as gene transcript or protein levels, the median norm can be used where an observation is coded "one" if it is greater than the median value for the feature across all samples. +For numerically valued features such as gene transcript or protein levels, the median norm can be used where an observation is coded **1** if it is greater than the median value for the feature across all samples. #### Example dataset Here is a sample dataset to use with the moses-service. One of the columns in the dataset should be set as **target feature**. In this dataset, it is the first column named as **‘case’.** @@ -34,10 +32,40 @@ Here is a sample dataset to use with the moses-service. One of the columns in th | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | +#### Example Options file +This exaple yaml file is from [tests/data/options.yaml](https://github.com/MOZI-AI/moses-service/blob/master/tests/data/options.yaml). +``` +moses_opts: "-j8 --balance 1 \ + -m 10000 -W1 \ + --output-cscore 1 --result-count 100 \ +# feature selection parameters + --enable-fs 1 --fs-algo simple --fs-target-size 4 \ + --fs-focus all --fs-seed init \ +# hill climbing parameters + --hc-widen-search 1 --hc-crossover-min-neighbors 5000 \ + --hc-fraction-of-nn .3 --hc-crossover-pop-size 1000 \ + --reduct-knob-building-effort 1 --complexity-ratio 3" + +cross_val_opts: + folds: 3 + random_seed: 2 + test_size: 0.3 + +target_feature: "case" + +filter: + score: "accuracy" + value: 0.4 +``` +- **moses_opts:** See [here](https://wiki.opencog.org/w/MOSES_man_page) for a complete description of MOSES options. +- **cross_val_opts:** "Monte Carlo" cross-validation is used where **folds * random_seed = n** training folds are constructed from a balanced random partition of fraction 1 - **test_size** of the original data set. +- **filter:** Possible scores are **precision**, **recall**, **accuracy**, **f1**, and **p-value** for null hypothesis that model score is greater than the null model returning **false** for all inputs. All values are in range 0 to 1. The default filter is p value < 0.05. + +### Output files +**to be added** ## Getting Started - ### Requirements - [Python 3.6.5](https://www.python.org/downloads/release/python-365/)