数据挖掘 - 随机森林模式评分 - 吾爱随笔录

我们正在使用随机森林算法，但在理解它使用的评分方法时遇到了一些麻烦。

以测试集的以下 CM 为例：

Threshold 45 cm is: 
[[67969 48031]
 [ 3321 11120]] and the prescion is: 0.18799344051632602
Threshold 50 cm is: 
[[77642 38358]
 [ 4785  9656]] and the prescion is: 0.2011080101632834
Threshold 55 cm is: 
[[88825 27175]
 [ 6796  7645]] and the prescion is: 0.2195577254445159
Threshold 60 cm is: 
[[100411  15589]
 [  9629   4812]] and the prescion is: 0.2358707906463611
Threshold 65 cm is: 
[[112421   3579]
 [ 13098   1343]] and the prescion is: 0.2728565623674755
Threshold 70 cm is: 
[[115895    105]
 [ 14371     70]] and the prescion is: 0.3999999997714286
Threshold 75 cm is: 
[[115998      2]
 [ 14440      1]] and the prescion is: 0.3333333222222226
Threshold 80 cm is: 
[[116000      0]
 [ 14441      0]] and the prescion is: 0.0
Threshold 85 cm is: 
[[116000      0]
 [ 14441      0]] and the prescion is: 0.0
Threshold 90 cm is: 
[[116000      0]
 [ 14441      0]] and the prescion is: 0.0

这就是我们使用 RF 并打印它的分数的方式：

grid_clf = RandomizedSearchCV(clf, param_grid, cv=tscv, verbose=10,n_iter=20,n_jobs=-1,scoring='roc_auc')
grid_clf.fit(X_train, y_train)
print(grid_clf.score(X_test,y_test))

我们为这个模型得到的分数是 0.7350173458471928

据我了解，使用 roc_auc 时的得分在 0.5 到 1 之间。

这么差的模型怎么能拿到这么好的分数呢？

这个分数是怎么计算的？

如果我们预测了足够多的真阳性，我们不介意遗漏“1”并预测假阳性。我们当然介意预测真阴性

我可以更改评分以适应我认为更好的结果吗？

谢谢