大家好,又见面了,我是你们的朋友全栈君。如果您正在找激活码,请点击查看最新教程,关注关注公众号 “全栈程序员社区” 获取激活教程,可能之前旧版本教程已经失效.最新Idea2022.1教程亲测有效,一键激活。
Jetbrains全系列IDE使用 1年只要46元 售后保障 童叟无欺
简介
K折交叉验证:将样本切成K份,每次取其中一份做为测试集,剩余的K-1份做为训练集。根据训练训练出模型或者假设函数。 把这个模型放到测试集上,得到分类率。计算k次求得的分类率的平均值,作为该模型或者假设函数的真实分类率。
在sklearn.model_selection 中提供了几种K折交叉验证。
生成样本
>>> from sklearn.datasets import make_classification
>>> data,target=make_classification(n_samples=10)
>>> print(target)
[1 1 0 1 1 0 0 1 0 0]
sklearn.model_selection.KFold
KFold按数据原有的顺序对数据进行分割。可以通过定义shuffle来打乱顺序。
>>> from sklearn.model_selection import KFold
>>> kfold= KFold(n_splits=5,random_state =None)
>>> for train_index,test_index in kfold.split(data,target):
... print("TRAIN:", train_index, "TEST:", test_index)
... print("TRAIN_target:", target[train_index].mean(), "TEST_target:", target[test_index].mean())
TRAIN: [2 3 4 5 6 7 8 9] TEST: [0 1]
TRAIN_target: 0.375 TEST_target: 1.0
TRAIN: [0 1 4 5 6 7 8 9] TEST: [2 3]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [0 1 2 3 6 7 8 9] TEST: [4 5]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [0 1 2 3 4 5 8 9] TEST: [6 7]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [0 1 2 3 4 5 6 7] TEST: [8 9]
TRAIN_target: 0.625 TEST_target: 0.0
sklearn.model_selection.StratifiedKFold
StratifiedKFold是KFold的一个变种,目的是保证每一个分层标签的比例和原始样本一致。
>>> from sklearn.model_selection import StratifiedKFold
>>> stkfold= StratifiedKFold(n_splits=5,random_state =None)
>>> for train_index,test_index in stkfold.split(data,target):
... print("TRAIN:", train_index, "TEST:", test_index)
... print("TRAIN_target:", target[train_index].mean(), "TEST_target:", target[test_index].mean())
TRAIN: [1 3 4 5 6 7 8 9] TEST: [0 2]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [0 2 3 4 6 7 8 9] TEST: [1 5]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [0 1 2 4 5 7 8 9] TEST: [3 6]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [0 1 2 3 5 6 7 9] TEST: [4 8]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [0 1 2 3 4 5 6 8] TEST: [7 9]
TRAIN_target: 0.5 TEST_target: 0.5
sklearn.model_selection.RepeatedKFold
重复n次K-Fold ,每次重复有不同的随机性。
>>> from sklearn.model_selection import RepeatedKFold
>>> rpkfold= RepeatedKFold(n_splits=5,n_repeats=2,random_state =2652124)
>>> for train_index,test_index in rpkfold.split(data,target):
... print("TRAIN:", train_index, "TEST:", test_index)
... print("TRAIN_target:", target[train_index].mean(), "TEST_target:", target[test_index].mean())
TRAIN: [0 1 3 4 5 6 7 9] TEST: [2 8]
TRAIN_target: 0.625 TEST_target: 0.0
TRAIN: [0 2 3 4 5 6 8 9] TEST: [1 7]
TRAIN_target: 0.375 TEST_target: 1.0
TRAIN: [0 1 2 4 5 6 7 8] TEST: [3 9]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [1 2 3 4 5 7 8 9] TEST: [0 6]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [0 1 2 3 6 7 8 9] TEST: [4 5]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [0 1 3 4 6 7 8 9] TEST: [2 5]
TRAIN_target: 0.625 TEST_target: 0.0
TRAIN: [0 1 2 3 4 5 8 9] TEST: [6 7]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [0 1 2 5 6 7 8 9] TEST: [3 4]
TRAIN_target: 0.375 TEST_target: 1.0
TRAIN: [0 2 3 4 5 6 7 9] TEST: [1 8]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [1 2 3 4 5 6 7 8] TEST: [0 9]
TRAIN_target: 0.5 TEST_target: 0.5
sklearn.model_selection.GroupKFold
按组对样本进行分层。
同一组不会出现在两个不同的分层中(不同组的数量必须至少等于折的数量)。
>>> import numpy as np
>>> from sklearn.model_selection import GroupKFold
>>> gpkfold= GroupKFold(n_splits=5)
>>> groups = np.array([0, 0, 1 ,1 ,3 ,4 ,1 ,1 ,2 , 2])
>>> for train_index,test_index in gpkfold.split(data,target,groups):
... print("TRAIN:", train_index, "TEST:", test_index)
... print("TRAIN_target:", target[train_index].mean(), "TEST_target:", target[test_index].mean())
TRAIN: [0 1 4 5 8 9] TEST: [2 3 6 7]
TRAIN_target: 0.5 TEST_target: 0.5
TRAIN: [0 1 2 3 4 5 6 7] TEST: [8 9]
TRAIN_target: 0.625 TEST_target: 0.0
TRAIN: [2 3 4 5 6 7 8 9] TEST: [0 1]
TRAIN_target: 0.375 TEST_target: 1.0
TRAIN: [0 1 2 3 4 6 7 8 9] TEST: [5]
TRAIN_target: 0.555555555556 TEST_target: 0.0
TRAIN: [0 1 2 3 5 6 7 8 9] TEST: [4]
TRAIN_target: 0.444444444444 TEST_target: 1.0
结论
建模时,一般是使用KFold和StratifiedKFold。需要完成特殊分群的时,比如按月份划分数据,可以使用GroupKFold 。
发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/191520.html原文链接:https://javaforall.cn
【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛
【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...