Skip to content

Machine Learning Libraries

lisahua edited this page Jul 31, 2014 · 7 revisions

Table of Content

Machine learning libraries and models

Framework Neural Network Logistic Regression Decision Tree Support Vector Machine
Encog Yes with multiple optimizations Yes Yes in C#,coming soon for Java version Yes, use libsvm
WEKA Yes Yes Yes with several variants Yes
Mahout Yes Yes Random Forest Yes
Spark No Yes Yes Yes
H2o No Yes Random Forest No
Hama Yes Yes No No

|Framework|model|class name|parent class/interface|Training method|retrieve training result|evalution method|Basic Data Structure|Notes| |----|----|---|----|----|----|---|----|----|---| |Encog|Neural Network|BasicNetwork|NeuralNetwork|propagation.iteration()|getWeights():double[]|compute()|MLData: Double[], MLDataSet: Set<Double[]>|Various training and propagation methods| |Encog|Logistic Regression|BasicNetwork|NeuralNetwork|propagation.iteration()|getWeights():double[]|compute()|| |Spark|Logistic Regression|LogisticRegressionModel|GeneralLinearModel, ClassificationModel|train(RDD data)|weights():double[]|predict(RDD):RDD|RDD: Resilient Distributed Dataset|weights: intercept followed by weight list| |Mahout|Neural Network|MultilayerPerceptron|NeuralNetwork|trainOnline(Vector instance)|getWeightMatrices():Matrix|getOutput(Vector):Vector|Vector, Matrix: List |1.inputVector:inputFields followed by actualValues
2.bias=1 comes as the first neuron in each non-final layer| |Mahout|Logistic Regression|OnlineLogisticRegression|AbstractOnlineLogisticRegression|train(Vector actual, Vector instance)|getBeta(): Matrix|classifyScalar(Vector instance):double||

Machine learning libraries

Encog support multiple neural network variants and SVM. It mainly focuses on classification with neural network.

Encog facilitates various "patterns"(such as ART and RBF) with different training and propagation techniques, to support a variety of advanced algorithms.

WEKA is a collection of machine learning algorithms for data mining tasks.It covers almost all common algorithms such as decision tree, regression and clustering.

WEKA supports PMML right now.

Mahout focuses on scalability. It contains a set of algorithms for clustering, classification and collaborative filtering. It is strong in recommendation.

Spark is a fast memory-optimized execution engine with Python/Java/Scala APIs. MLlib is Spark implementation of some common machine learning algorithms, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives.

It supports logistic regression, SVM, and decision tree, advanced neural network algorithms are coming soon.

MLlib is mainly developed via Scala.

H2o is an engine to run machine learning at scale, with RESTful API and user interface. It’s an in-memory data engine specifically designed for running various types of statistical computations.

It support Random Forest, regression and classification algorithms such as Gradient Boosted Trees and Generalized linear model, K-means clustering, and deep learning.

Apache Hama is another project that does advanced analytics beyond MapReduce. Hama uses Bulk Synchronous Parallel model on the top of Hadoop, which can be more effective than "plain" MapReduce.

To run such iterative data analysis applications more efficiently, Hama offers pure Bulk Synchronous Parallel computing engine. Hama has Logistic Regression and Neural Network model right now.

Useful Links

Machine Learning Tutorials

Unsupervised Feature Learning and Deep Learning

Input Datasets

UCI Machine Learning Dataset

Clone this wiki locally