例如,对于 3 类分类,我们希望使用如下标签进行训练A, 它是 one-hot 编码为(1,0,0),还有一个模糊的标签,比如(0.8,0.2,0). 在这种情况下,sklearn 的 kNN 和 SVM 不支持模糊标签。
但是,我们可以使用 sklearnMultiOutputRegressor将单输出回归器(例如支持向量回归 (SVR))扩展到多个输出。值得注意的是,神经网络非常适合这种类型的标签,因为它们很容易将数值向量用作标签。
这是一个针对 kNN、SVC(多类 SVM)和 MultiRegression SVR 的不同类型标签的代码:
import sklearn
import pandas as pd
from sklearn.svm import SVC, SVR
from sklearn.model_selection import KFold, cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.multioutput import MultiOutputRegressor
import numpy as np
N = 1000
split = int(0.8 * N)
folds = 5
seed = 1234
# Data
np.random.seed(seed)
feature_1 = np.random.normal(0, 2, N)
feature_2 = np.random.normal(5, 6, N)
X = np.vstack([feature_1, feature_2]).T
Y_label = np.random.choice(['A', 'B', 'C'], N)
Y_one_hot = pd.get_dummies(Y_label).values
smooth_filter = np.array([0.01, 0.98, 0.01])
Y_fuzzy = np.apply_along_axis(
lambda m: np.convolve(m, smooth_filter, mode='same'), axis=1, arr=Y_one_hot
)
kfold = KFold(n_splits=folds, random_state=seed)
kNN = KNeighborsClassifier(n_neighbors=3)
svc = SVC()
svr = SVR()
multi_svr = MultiOutputRegressor(estimator=SVR())
knn_label = np.average(cross_val_score(kNN, X, Y_label, cv=kfold))
knn_one_hot = np.average(cross_val_score(kNN, X, Y_one_hot, cv=kfold))
try:
knn_fuzzy = np.average(cross_val_score(kNN, X, Y_fuzzy, cv=kfold))
except ValueError:
print('kNN: fuzzy classes are not supported')
svc_label = np.average(cross_val_score(svc, X, Y_label, cv=kfold))
try:
svc_one_hot = np.average(cross_val_score(svc, X, Y_one_hot, cv=kfold))
except ValueError:
print('SVC: vector is not supported')
try:
svr_one_hot = np.average(cross_val_score(svr, X, Y_one_hot, cv=kfold))
except ValueError:
print('SVR: vector is not supported')
multi_svr_one_hot = np.average(cross_val_score(multi_svr, X, Y_one_hot, cv=kfold, scoring='neg_mean_absolute_error'))
multi_svr_fuzzy = np.average(cross_val_score(multi_svr, X, Y_fuzzy, cv=kfold, scoring='neg_mean_absolute_error'))
print('sklearn version', sklearn.__version__)
print('Y example: ',
"label: ", Y_label[0],
", one hot: ", Y_one_hot[0, :],
", fuzzy: ", Y_fuzzy[0, :])
print('kNN label: ', knn_label)
print('kNN one hot: ', knn_one_hot)
print('SVC label: ', svc_label)
print('MultiSVR one hot: ', multi_svr_one_hot)
print('MultiSVR fuzzy: ', multi_svr_fuzzy)
输出:
kNN: fuzzy classes are not supported
SVC: vector is not supported
SVR: vector is not supported
sklearn version 0.19.1
Y example: label: B , one hot: [0 1 0] , fuzzy: [0.01 0.98 0.01]
kNN label: 0.321
kNN one hot: 0.254
SVC label: 0.332
MultiSVR one hot: -0.4066160996805417
MultiSVR fuzzy: -0.3970780923514713
尽管 kNN 不会对 one-hot 编码标签抛出异常,但准确度0.254表明它不能与向量一起正常工作。
此外,由于任务被理解为回归,因此为 MultiSVR 报告了负平均绝对误差。分数accuracy只能在将模糊标签和预测更改回标签后使用。