StratifiedKFold：ValueError：支持的目标类型是：（'binary'，'multiclass'）。取而代之的是“多标签指示器”

数据挖掘机器学习神经网络 scikit-学习交叉验证

2021-09-17 01:00:29

使用Sklearn分层 kfold 拆分，当我尝试使用多类拆分时，我收到错误消息（见下文）。当我尝试使用二进制进行拆分时，它没有问题。

num_classes = len(np.unique(y_train))
y_train_categorical = keras.utils.to_categorical(y_train, num_classes)
kf=StratifiedKFold(n_splits=5, shuffle=True, random_state=999)

将数据分成不同的折叠

for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical)):
   x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
   y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.

有没有办法可以将 KFold 与多类一起使用？

2个回答

我有同样的问题，你可以在这里找到我的详细答案。

基本上，KFold不会将您的目标识别为多类，因为它依赖于以下定义：

'binary'：y包含 <= 2 个离散值，并且是 1d 或列向量。

'multiclass'：y包含两个以上的离散值，不是序列序列，是一维或列向量。

'multiclass-multioutput'：y是一个包含两个以上离散值的二维数组，不是序列序列，并且两个维度的大小都大于 1。

'multilabel-indicator'：y是一个标签指示矩阵，一个二维数组，至少有两列，最多有2个唯一值。

有一种更简单的方法可以代替使用循环。Scikit提供cross_val_score。

from sklearn.cross_validation import KFold, cross_val_score
k_fold = KFold(len(y), n_folds=10, shuffle=True, random_state=0)
clf = <any classifier>
print cross_val_score(clf, X, y, cv=k_fold, n_jobs=1)

这个话题也在这里讨论过。

您还可以在此处查看其代码片段，该代码片段可能对您有所帮助：

from sklearn.model_selection import KFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = KFold(n_splits=2)
kf.get_n_splits(X)

print(kf)  

for train_index, test_index in kf.split(X):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

其中第一个 n_samples % n_splits 折叠的大小为 n_samples // n_splits + 1，其他折叠的大小为 n_samples // n_splits，其中 n_samples 是样本数。

其它你可能感兴趣的问题

上一篇R2 得分是对大型数据集的合理回归度量吗？下一篇word2vec 和 GloVe 的词汇表