python stacking_详解 Stacking 的 python 实现[通俗易懂]

全栈程序员-用户IM • 2022年4月8日下午12:00 • 未分类

大家好，又见面了，我是你们的朋友全栈君。

1. 什么是 stacking

stacking 就是当用初始训练数据学习出若干个基学习器后，将这几个学习器的预测结果作为新的训练集，来学习一个新的学习器。

5905f19c4df6

2. 代码：

例如我们用 RandomForestClassifier, ExtraTreesClassifier, GradientBoostingClassifier 作为第一层学习器：

# Our level 0 classifiers

clfs = [

RandomForestClassifier(n_estimators = n_trees, criterion = ‘gini’),

ExtraTreesClassifier(n_estimators = n_trees * 2, criterion = ‘gini’),

GradientBoostingClassifier(n_estimators = n_trees),

]

接着要训练第一层学习器，并得到第二层学习器所需要的数据，这里会用到 k 折交叉验证。

1. 先用初始训练集训练 clf，并得到第二层的训练数据 blend_train：

第 j 个学习器，共经过 nfolds 次交叉验证，每一次会得到当前验证集角标上的预测值，nfolds 之后得到和初始训练集一样大小的集合：

blend_train[cv_index, j] = clf.predict(X_cv)

5905f19c4df6

2. 再用 clf 对 test 集进行预测，来得到第二层的测试数据 blend_test：

即每个第一层学习器在每次 fold 时，用学习器对初识测试集进行预测，n 次之后，对所有结果取平均值，得到第 j 个学习器在测试集上的预测结果：

blend_test_j[:, i] = clf.predict(X_test)

blend_test[:, j] = blend_test_j.mean(1)

5905f19c4df6

这样第一层的每个学习器，都会得到一列训练数据和一列测试数据为第二层的学习器所用。

# For each classifier, we train the number of fold times (=len(skf))

for j, clf in enumerate(clfs):

print ‘Training classifier [%s]’ % (j)

blend_test_j = np.zeros((X_test.shape[0], len(skf))) # Number of testing data x Number of folds , we will take the mean of the predictions later

for i, (train_index, cv_index) in enumerate(skf):

print ‘Fold [%s]’ % (i)

# This is the training and validation set

X_train = X_dev[train_index]

Y_train = Y_dev[train_index]

X_cv = X_dev[cv_index]

Y_cv = Y_dev[cv_index]

clf.fit(X_train, Y_train)

# This output will be the basis for our blended classifier to train against,

# which is also the output of our classifiers

blend_train[cv_index, j] = clf.predict(X_cv)

blend_test_j[:, i] = clf.predict(X_test)

# Take the mean of the predictions of the cross validation set

blend_test[:, j] = blend_test_j.mean(1)

3. 接着用 blend_train, Y_dev 去训练第二层的学习器 LogisticRegression：

# Start blending!

bclf = LogisticRegression()

bclf.fit(blend_train, Y_dev)

blend_train = np.zeros((X_dev.shape[0], len(clfs)))，这个集合是有几个学习器就有几列：

5905f19c4df6

4. 再用 bclf 来预测测试集 blend_test，并得到 score：

# Predict now

Y_test_predict = bclf.predict(blend_test)

score = metrics.accuracy_score(Y_test, Y_test_predict)

print ‘Accuracy = %s’ % (score)

blend_test = np.zeros((X_test.shape[0], len(clfs)))，也是有几个学习器就有几列：

5905f19c4df6

整体流程简图如下：

5905f19c4df6

推荐阅读历史技术博文链接汇总

http://www.jianshu.com/p/28f02bb59fe5

也许可以找到你想要的：

[入门问题][TensorFlow][深度学习][强化学习][神经网络][机器学习][自然语言处理][聊天机器人]

发布者：全栈程序员-用户IM，转载请注明出处：https://javaforall.cn/126552.html原文链接：https://javaforall.cn

【正版授权，激活自己账号】： Jetbrains全家桶Ide使用，1年售后保障，每天仅需1毛

【官方授权正版激活】： 官方授权正版激活支持Jetbrains家族下所有IDE 使用个人JB账号...

python stacking_详解 Stacking 的 python 实现[通俗易懂]

相关推荐

Git下载安装及设置详细教程

如何以貌取人【蔡澜】

手把手教你使用YOLOV5训练自己的目标检测模型-口罩检测-视频教程

第七篇：两个经典的文件IO程序示例「建议收藏」

linux修改smb端口_sftp默认端口号是多少

django mysqlclient_MySQL打不开

发表回复