测试集上的概率分布
数据挖掘
数据
预言
可能性
2022-02-24 01:05:42
1个回答
预计在训练集中有“更有信心”的预测概率,即您的模型将为用于训练的样本分配更接近 0 和 1 的概率,并且在中间更小。
如上所述,您几乎无法仅通过概率的直方图来说明您的模型性能,但您可以通过将它们按正负类拆分并添加诸如ks之类的分离度量来添加更多信息
代码示例:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import ks_2samp
import matplotlib.pyplot as plt
data = load_breast_cancer()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = .2, random_state = 42)
clf = RandomForestClassifier(random_state = 42).fit(X_train,y_train)
preds_test = clf.predict_proba(X_test)[:,1]
preds_train = clf.predict_proba(X_train)[:,1]
ks_train = round(ks_2samp(preds_train[y_train == 1], preds_train[y_train == 0])[0],4)
ks_test = round(ks_2samp(preds_test[y_test == 1], preds_test[y_test == 0])[0],4)
fig, ax = plt.subplots(1,2,figsize = (10,5))
ax[0].hist(preds_train[y_train == 1], color = "darkred",bins = "scott", alpha = .5, edgecolor = "red")
ax[0].hist(preds_train[y_train == 0], color = "darkgreen",bins = "scott", alpha = .5, edgecolor = "green")
ax[0].set_title(f"Class separation in train set\nks: {ks_train}")
ax[1].hist(preds_test[y_test == 1], color = "darkred",bins = "scott", alpha = .5, edgecolor = "red")
ax[1].hist(preds_test[y_test == 0], color = "darkgreen",bins = "scott", alpha = .5, edgecolor = "green")
ax[1].set_title(f"Class separation in test set\nks: {ks_test}");
其它你可能感兴趣的问题


