我无法访问 SparkMLlib 中模型估计器的参数。更确切地说,我的问题是:我有一个逻辑回归模型,我想为其找到最佳正则化参数(regParam和elasticNetParam)。为了做到这一点,我使用CrossValidatorwhich 工作并发现我的模型比我尝试过的所有其他模型都好。问题是我不知道如何访问交叉验证器找到的参数的实际值。下面是我用来适合我的交叉验证器的代码:
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml.classification import LogisticRegression
lr_predictor = LogisticRegression(featuresCol='polyFeatures', labelCol='label', maxIter=10)
paramGrid = ParamGridBuilder() \
.addGrid(lr_predictor.elasticNetParam, [0., 0.5, 1]) \
.addGrid(lr_predictor.regParam, [0.1, 0.01]) \
.build()
crossval = CrossValidator(estimator=LogRegPipeline,
estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator(),
numFolds=2)
cvModel = crossval.fit(train_set)
bestModel = cvModel.bestModel
# How to get the best parameters fitted by cvModel