数据挖掘 - 为什么我会收到带有 RFE 的 SVR 模型的 ValueError，但仅在使用管道时？ - 吾爱随笔录

为什么我会收到带有 RFE 的 SVR 模型的 ValueError，但仅在使用管道时？

数据挖掘机器学习回归支持射频

2022-03-11 01:25:05

我正在运行五种不同的回归模型来找到一个变量的最佳预测模型。我正在使用 Leave-One-Out 方法并使用 RFE 来找到最佳预测特征。

五个模型中有四个运行良好，但我遇到了 SVR 的问题。这是我下面的代码：

from numpy import absolute, mean, std
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.model_selection import cross_val_score, LeaveOneOut
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.feature_selection import RFECV
from sklearn.pipeline import Pipeline

# one hot encoding
dataset.Gender.replace(to_replace=['M','F'],value=[1,0],inplace=True)

# select predictors and dependent 
X = dataset.iloc[:,12:]
y = dataset.iloc[:,2]

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)

首先，我运行具有所有功能的 LOOCV，它运行良好

## LOOCV with all features
# find number of samples
n = X.shape[0]
# create loocv procedure
cv = LeaveOneOut()
# create model
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
# evaluate model
scores = cross_val_score(regressor, X, y, scoring='neg_mean_squared_error', cv=n)
# force positive
#scores = absolute(scores)

# report performance
print('MSE: %.3f (%.3f)' % (mean(scores), std(scores)))

接下来，我想包含 RFECV 以找到模型的最佳预测特征，这对于我的其他回归模型运行良好。

这是我收到错误的代码部分：

# automatically select the number of features with RFE

# create pipeline
rfe = RFECV(estimator=SVR(kernel = 'rbf'))
model = SVR(kernel = 'rbf')
pipeline = Pipeline(steps=[('s',rfe),('m',model)])
# find number of samples
n = X.shape[0]
# create loocv procedure
cv = LeaveOneOut()
# evaluate model
scores = cross_val_score(pipeline, X, y, scoring='neg_mean_squared_error', cv=n)
# report performance
print('MSE: %.3f (%.3f)' % (mean(scores), std(scores)))

我收到的错误是

ValueError: when `importance_getter=='auto'`, the underlying estimator SVR should have `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.

我不确定这个错误是什么意思？

1个回答

RFE通过拟合其估计器然后消除最差特征并递归来操作。“最差”特征是通过使用模型中的特征重要性来确定的，默认情况下使用coef_或feature_importances_（如错误消息中所述）。 SVR没有这样的属性，并且确实没有内置的特征重要性，尤其是非线性内核。另请参阅https://stats.stackexchange.com/q/265656/232706

由于估计器是一个管道，您无论如何都需要提供更多关于从何处获取系数的详细信息，请参阅文档的RFE第二段：importance_getter

还接受一个字符串，该字符串指定用于提取特征重要性的属性名称/路径（用实现attrgetter）。例如，给出regressor_.coef_in case ofTransformedTargetRegressor或named_steps.clf.feature_importances_in case ofsklearn.pipeline.Pipeline其最后一步命名为clf。

最后，如果您真的想使用SVR，请查看文档的第三段importance_getter：

如果可调用，则覆盖默认的特征重要性获取器。可调用对象与拟合的估计器一起传递，它应该返回每个特征的重要性。

~~您可以编写一个使用排列重要性（尽管这会很昂贵）或其他一些不可知的重要性度量的可调用对象。~~ 呃，实际上，因为可调用只得到拟合的估计量，而不是数据，排列重要性将不起作用。 另请参阅https://stats.stackexchange.com/q/191402/232706

其它你可能感兴趣的问题

上一篇逻辑回归的梯度下降实现下一篇自动在文本文档中寻找商机