Home

Welcome to the Shifu wiki!

Shifu is an open-source, end-to-end machine learning and data mining framework built on top of Hadoop. Shifu is designed for data scientists, simplifying the life-cycle of building machine learning models. While originally built for fraud modeling, Shifu is generalized for many other modeling domains.

Shifu provides a simple command-line interface for each step of the model building process, including

Statistic calculation & variable selection to determine the most predictive variables in your data
Variable normalization
Distributed variable selection based on sensitivity analysis
Distributed neural network model training
Distributed tree ensemble model training
Post training analysis & model evaluation

Shifu’s fast Hadoop-based, distributed neural network / logistic regression / gradient boosted trees training can reduce model training time from days to hours on TB data sets. Shifu integrates with Pig/MapReduce workflows on Hadoop, and Shifu-trained models can be integrated into production code with standard PMML format or native format with a simple Java API. Shifu leverages Hadoop, Pig, Akka, Encog and other open source projects.

Documents

Shifu: A Distributed Model Training Framework on Hadoop

DOWNLOAD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Documents

Clone this wiki locally