Let’s Write a Pipeline

转载请注明作者：梦里风林

Google Machine Learning Recipes 4

回顾并强化概念

编写一个基础 Pipeline 进行监督学习

监督学习基础套路
- 例子：一个垃圾邮件分类器
关键在于标记刚收到的邮件是否为垃圾邮件
- Train vs Test：在实验中，将数据分成训练集和测试集，训练集负责训练模型，而测试集负责验证训练的准确度
- 特征 X 与标签 Y：我们可以把分类器看成是一种函数f(x) = y
feature: x,label:y classifier 其实就是一个feature到label的函数
- 可以从sklearn中import各种分类器( tree, KNeighborsClassifier )进行训练，各种分类器有类似的接口( fit, predict )
这些不同分类器都可以解决类似的问题

Code(a Pipeline)

# import a dataset
from sklearn import datasets
iris = datasets.load_iris()

X = iris.data
y = iris.target

from sklearn.model_selection import train_test_split

# DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5)

# compare the predicted labels to the true labels

# from sklearn import tree
# my_classifier = tree.DecisionTreeClassifier()

# use another classifier
from sklearn.neighbors import KNeighborsClassifier
my_classifier = KNeighborsClassifier()

my_classifier.fit(X_train, y_train)

predictions = my_classifier.predict(X_test)
# print predictions

# compare the predicted labels to the true labels
from sklearn.metrics import accuracy_score
print accuracy_score(y_test, predictions)

在0.20或以上的版本，用 model_selection 代替 cross_validation

关于一个算法从数据中学习的真正含义

拒绝手工写分类规则代码
本质上，是学习feature到label，从输入到输出的函数
从一个模型开始，用规则来定义函数
根据训练数据调整函数参数
从我们发现规律的方法中，找到model
比如一条划分两类点的线就是一个分类器的model，调整参数就能得到我们想要的分类器：
TensorFlow PlayGround

Example of Neural Network

sklearn 笔记

觉得原作者(ahangchen)的文章对您有帮助的话，就给个star吧～

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Let’s Write a Pipeline

回顾并强化概念

编写一个基础 Pipeline 进行监督学习

Code(a Pipeline)

关于一个算法从数据中学习的真正含义

Files

README.md

Latest commit

History

README.md

File metadata and controls

Let’s Write a Pipeline

回顾并强化概念

编写一个基础 Pipeline 进行监督学习

Code(a Pipeline)

关于一个算法从数据中学习的真正含义