XGBoost - 特征重要性仅取决于特征在数据中的位置

数据挖掘 Python 特征选择 xgboost
2022-02-15 12:04:36

我正在尝试使用 XGBoost 进行一些特征选择,但特征重要性图表只是按出现顺序吐出特征。xtrain 数据中第一列中的特征是迄今为止最重要的,然后是第二列,依此类推。

这似乎表明该模型无法正常工作,因为它没有真正学到任何东西......关于可能出错的任何建议?

更新:相关矩阵 https://ibb.co/3shDJjD

型号代码:

params = {
  'subsample':0.5,
  'learning_rate': 0.3,
  'max_depth':8,
  'num_parallel_trees' : 20,
  'objective': 'reg:squarederror',
  'verbosity':0,
  }
watchlist = [(train, 'train'), (test, 'val')]
reg = xgb.train(params, train, num_boost_round=5, early_stopping_rounds=5, evals=watchlist)

结果:

[0] train-rmse:0.274535 val-rmse:0.27431
Multiple eval metrics have been passed: 'val-rmse' will be used for early stopping.

Will train until val-rmse hasn't improved in 5 rounds.
[1] train-rmse:0.273472 val-rmse:0.273653
[2] train-rmse:0.272796 val-rmse:0.27341
[3] train-rmse:0.272318 val-rmse:0.27334
[4] train-rmse:0.271943 val-rmse:0.273346
[5] train-rmse:0.271604 val-rmse:0.273374
[6] train-rmse:0.271218 val-rmse:0.273442
[7] train-rmse:0.270927 val-rmse:0.273529
[8] train-rmse:0.270641 val-rmse:0.273561
Stopping. Best iteration:
[3] train-rmse:0.272318 val-rmse:0.27334

特征重要性(注意,0 和 1 是第一位的)。如果我更改 xtrain 中列的顺序,特征重要性也会发生变化,前两列将始终是两个最重要的特征。 https://ibb.co/QcHwbNg

1个回答

您可以使用这段代码来绘制数据的特征重要性。您的数据也有可能以重要性降低的方式排列。

from xgboost import XGBClassifier
from xgboost import plot_importance
from matplotlib import pyplot
# load data
X = 
y = 
model = XGBClassifier()
model.fit(X, y)
plot_importance(model)
pyplot.show()