-
Notifications
You must be signed in to change notification settings - Fork 108
Home
Welcome to the Shifu wiki!
Shifu is an open-source, end-to-end machine learning and data mining framework built on top of Hadoop. Shifu is designed for data scientists, simplifying the life-cycle of building machine learning models. While originally built for fraud modeling, Shifu is generalized for many other modeling domains.
Shifu provides a simple command-line interface for each step of the model building process, including
- Statistic calculation & variable selection to determine the most predictive variables in your data
- Variable normalization
- Distributed variable selection based on sensitivity analysis
- Distributed neural network model training
- Distributed tree ensemble model training
- Post training analysis & model evaluation
Shifu’s fast Hadoop-based, distributed neural network / logistic regression / gradient boosted trees training can reduce model training time from days to hours on TB data sets. Shifu integrates with Pig/MapReduce workflows on Hadoop, and Shifu-trained models can be integrated into production code with standard PMML format or native format with a simple Java API. Shifu leverages Hadoop, Pig, Akka, Encog and other open source projects.
- Tutorial - Build Your First ML Model
- Machine Learning Pipeline in Shifu
- Shifu Best Practices - FAQ
- Distributed Neural Network Training in Shifu
- Distributed Tree Ensemble Model Training in Shifu
- Distributed TensorFlow Support on Shifu
- Grid/Random Search Support in Shifu
- Variable Selection in Shifu
- Variable Binning in Shifu
- Variable Transform in Shifu
- Multiple Classification Support in Shifu
- Native Bagging Modeling Framework How to run Bagging in Shifu
- Assembling Models - Shifu combo
- CSV Format Support in Shifu
- Correlation Computing in Shifu
- Sampling & Filtering in Shifu Training Step
- Filter Expressions Testing for Train Dataset or Eval Dataset
- Dropout Support in NN/GBDT
- How to Deploy a Shifu Model to Production
- System Optimization to Accelerate Distributed Model Training
- [How to Tune a Good GBT Model]
- Distributed Logistic Regression in Shifu
- Segment Expansion for New Feature Generation
- Transfer Learning in Shifu
- Wide & Deep Learning on Shifu
- Train Regression Model in Shifu
- Shifu How-To
- Shifu 0.2.X (old documents)