数据挖掘 - ValueError：“估计器应该是分类器” - 吾爱随笔录

ValueError：“估计器应该是分类器”

数据挖掘 Python 神经网络分类 scikit-学习集成建模

2022-02-10 01:01:43

我正在调整sklearn-extension ELMClassifier以被接受为 base_estimatorVotingClassifier和AdaboostClassifier. 当我直接将 ELM 与 AdaboostClassifier 一起使用时，我的代码工作正常，但由于我要创建不同分类器的 Adaboost，我需要在 VotingClassifier 内部对其进行实例化，然后将 VotingClassifier 作为基本估计器传递给 Adaboost。

当我运行时adaboostCLF.fit(base_estimator=votingCLF, algorithm="SAMME")，我得到以下错误日志：

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-78-25ca9de0ea07> in <module>
          1 ada2 = AdaBoostClassifier(base_estimator=votingCLF, algorithm='SAMME')
    ----> 2 ada2.fit(X,y)

    ~\Anaconda3\lib\site-packages\sklearn\ensemble\_weight_boosting.py in fit(self, X, y, sample_weight)
        436 
        437         # Fit
    --> 438         return super().fit(X, y, sample_weight)
        439 
        440     def _validate_estimator(self):

    ~\Anaconda3\lib\site-packages\sklearn\ensemble\_weight_boosting.py in fit(self, X, y, sample_weight)
        140                 X, y,
        141                 sample_weight,
    --> 142                 random_state)
        143 
        144             # Early termination

    ~\Anaconda3\lib\site-packages\sklearn\ensemble\_weight_boosting.py in _boost(self, iboost, X, y, sample_weight, random_state)
        499         else:  # elif self.algorithm == "SAMME":
        500             return self._boost_discrete(iboost, X, y, sample_weight,
    --> 501                                         random_state)
        502 
        503     def _boost_real(self, iboost, X, y, sample_weight, random_state):

    ~\Anaconda3\lib\site-packages\sklearn\ensemble\_weight_boosting.py in _boost_discrete(self, iboost, X, y, sample_weight, random_state)
        563         estimator = self._make_estimator(random_state=random_state)
        564 
    --> 565         estimator.fit(X, y, sample_weight=sample_weight)
        566 
        567         y_predict = estimator.predict(X)

    ~\Anaconda3\lib\site-packages\sklearn\ensemble\_voting.py in fit(self, X, y, sample_weight)
        220         transformed_y = self.le_.transform(y)
        221 
    --> 222         return super().fit(X, transformed_y, sample_weight)
        223 
        224     def predict(self, X):

    ~\Anaconda3\lib\site-packages\sklearn\ensemble\_voting.py in fit(self, X, y, sample_weight)
         55     def fit(self, X, y, sample_weight=None):
         56         """Get common fit operations."""
    ---> 57         names, clfs = self._validate_estimators()
         58 
         59         if (self.weights is not None and

    ~\Anaconda3\lib\site-packages\sklearn\ensemble\_base.py in _validate_estimators(self)
        247                 raise ValueError(
        248                     "The estimator {} should be a {}.".format(
    --> 249                         est.__class__.__name__, is_estimator_type.__name__[3:]
        250                     )
        251                 )

    ValueError: The estimator customELMClassifer should be a classifier.

这是我正在使用的 ELM 的代码：

    class customELMClassifer(ELMClassifier):
        def resample_with_replacement(self, X_train, y_train, sample_weight):

            # normalize sample_weights if not already
            sample_weight = sample_weight / sample_weight.sum(dtype=np.float64)

            X_train_resampled = np.zeros((len(X_train), len(X_train[0])), dtype=np.float32)
            y_train_resampled = np.zeros((len(y_train)), dtype=np.int)
            for i in range(len(X_train)):
                # draw a number from 0 to len(X_train)-1
                draw = np.random.choice(np.arange(len(X_train)), p=sample_weight)

                # place the X and y at the drawn number into the resampled X and y
                X_train_resampled[i] = X_train[draw]
                y_train_resampled[i] = y_train[draw]

            return X_train_resampled, y_train_resampled


        def fit(self, X, y, sample_weight=None, random_state=0):
            if sample_weight is not None:
                X, y = self.resample_with_replacement(X, y, sample_weight)

            return super().fit(X, y)

我很感激一些反馈或解决方案，因为我没有从头开始创建 scikit-learn 估计器的经验。

提前致谢。

3个回答

这可能是因为您的课程并非真正 100% 兼容 scikit-learn 估算器接口。check_estimator您可以使用中的方法轻松验证这一点sklearn.utils.estimator_checks。这应该确保你它是一个合适的分类器，然后可以传递给 AdaBoost。

我还建议继承BaseEstimator自ELMClassifier.

有关更多详细信息，请参阅此处报告的说明以创建自定义（与 scikit-learn 接口兼容）估计器。

从回溯中，您可以发现问题源于这里：

is_estimator_type = (is_classifier if is_classifier(self)
                     else is_regressor)

for est in estimators:
    if est not in (None, 'drop') and not is_estimator_type(est):
        raise ValueError(
            "The estimator {} should be a {}.".format(
            est.__class__.__name__, is_estimator_type.__name__[3:]
        )
    )

的定义is_classifier只是_

getattr(estimator, "_estimator_type", None) == "classifier"

通常，继承ClassifierMixin是一种很好的做法，并且会提供属性_estimator_type = "classifier"。ELMClassifier在这种情况下，由于继承自ELMRegressor;可能会出现并发症。解决它的最简单方法可能是添加estimator_type = "classifier"到您的customELMClassifier.

我遇到了这个问题，它来自于没有从我的分类器函数中返回 clf，所以当我这样做时：

def random_for():
    from sklearn.ensemble import RandomForestClassifier
    clf = RandomForestClassifier(max_depth = 5)
    clf.fit(Input, Labels)
    y_pred = clf.predict(Test_input)
    predictions = [round(value) for value in y_pred]
    accuracy = accuracy_score(Test_Labels, predictions)
    print("Random Forest accuracy: %.2f%%" % (accuracy*100.0))

rclf = random_for()

我得到了与OP相同的错误。然而;

def random_for():
    from sklearn.ensemble import RandomForestClassifier
    clf = RandomForestClassifier(max_depth = 5)
    clf.fit(Input, Labels)
    y_pred = clf.predict(Test_input)
    predictions = [round(value) for value in y_pred]
    accuracy = accuracy_score(Test_Labels, predictions)
    print("Random Forest accuracy: %.2f%%" % (accuracy*100.0))

    return clf

rclf = random_for()

解决了这个问题。

其它你可能感兴趣的问题

上一篇总体准确度为 62% 的朴素贝叶斯的 Kohen Kappa 系数优于准确度为 98% 的逻辑回归？下一篇推荐系统的离线评估