Skip to content
holecm edited this page Dec 31, 2013 · 13 revisions

Preliminary proposal of plugin specifications.

Machine learning

NOTE: legacy implementation: ML_* at http://ida.felk.cvut.cz/svn/research/inspol05-09/pathways/blb/plugins/

The all ML methods are implemented by Weka in the old Xgene app.

Is it possible to define a general way how to pass arguments into the the ML methods? Xgene was based on Weka; therefore, there was a way to pass arguments as a simple text. This feature is meant to allow pass an undefined argument directly into a ML method. Maybe we should limit our ML methods to Orange library only...

SVM

Current implementation: R-mixGENE (no kernel choice yet)

options:

  • kernel
    • polynomial
      • degree
      • c
    • RBF

results:

  • model, (acc)
Random forest

(Alternatively the version designed for gene expression data. There a binary package for obsolete version of R.)

Existing implementation: scikit-learn

options:

  • m number of features used for a tree learning

results:

  • proximities
  • out-of-bag error estimate
  • variable importance
  • model, (acc)
Decision trees

Existing implementation: http://scikit-learn.org/stable/modules/tree.html or http://scaron.info/pydtl/

options:

results: model, visualized tree, (acc)

k-NN

Existing implementation: http://scikit-learn.org

options:

  • k
  • metrics
  • distance weight

results:

  • model, visualized tree, (acc)

Statistical tests

One-way (parametric and non-parametric) ANOVA

Current mplementation: R-miXGENE

Options:

  • statistical correction method

Results: -

Global test

R Example: here

Options:

  • statistical correction method

Results: -

Meta plugins

Ref: A.M.Molinaro,R.Simon,and R.M. Pfeiffer, “Prediction error estimation: a comparison of resampling methods,” Bioin- formatics, vol. 21, no. 15, pp. 3301–3307, 2005.

Train/test split

Implementation:

options:

  • s :split percentage

results: EStr, EStt

FIXME: Bootstrap

Implementation:

options:

results:

Cross validation

Implementation: http://scikit-learn.org/

options:

  • k :number of folds

FIXME: results:

Visualization

PCA

Implementation: R

options: scale, center

results: —

Box plot and a statistics of samples

Implementation:

options:

results:

  • boxplot
  • statistics
    • features

    • samples

    • missing or NaN values

    • min, mean, median, max

Feature selection

Ref: Saeys et al., A review of feature selection techniques in bioinformatics. Bioinformatics, 2007

T-Test

FIXME: How do deal with the zero-variance problem?

SVMRFE

legacy code: e134 experiment

Implementation: R

options:

  • kernel
    • polynomial
      • degree
      • c
  • Reduction of computational cost:
    • remove features by batches (more than one feature in one iteration)
      • With remembering order of features.
        • Returns rank of the all features.
      • Without remembering order of features.
        • Returns rank of top-n features

results: ES, ranking

Data acquisition

GEO downloader

Implementation:

Options:

Results: ES

Normalization

Quantile normalization

Legacy implementation:

Implementation: R

options: -

results: ES

Missing and outlier managment

Remove features due to missing values

Implementation:

options: -

results: ES

Feature filtering