我正在处理贷款问题,无论是贷款状态:违约者还是非违约者。在这个问题中,我的班级不平衡,90% 的班级是违约者,其中 10% 是非违约者。然后我尝试了过采样方法。这是我的代码:
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(0.5)
X,y=ros.fit_resample(X, y)
然后我在尝试预测我的测试数据后使用了随机森林分类器。我不是什么错误。这是我的代码。
pred=clf.predict(test_df)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_12673/2789171072.py in <module>
----> 1 pred=clf.predict(test_df)
~/miniconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py in predict(self, X)
796 The predicted classes.
797 """
--> 798 proba = self.predict_proba(X)
799
800 if self.n_outputs_ == 1:
~/miniconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py in predict_proba(self, X)
838 check_is_fitted(self)
839 # Check data
--> 840 X = self._validate_X_predict(X)
841
842 # Assign chunk of trees to jobs
~/miniconda3/lib/python3.8/site-packages/sklearn/ensemble/_forest.py in _validate_X_predict(self, X)
567 Validate X whenever one tries to predict, apply, predict_proba."""
568 check_is_fitted(self)
--> 569 X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
570 if issparse(X) and (X.indices.dtype != np.intc or X.indptr.dtype != np.intc):
571 raise ValueError("No support for np.int64 index based sparse matrices")
~/miniconda3/lib/python3.8/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
578
579 if not no_val_X and check_params.get("ensure_2d", True):
--> 580 self._check_n_features(X, reset=reset)
581
582 return out
这里有什么问题?以及当我们处理不平衡数据集时如何处理测试数据,还请与我分享处理不平衡类的好技术是什么。