Data Science Asked by Robse on November 13, 2021
I want to do a very simple cross validation using LogisticRegression.
Here is my code:
logreg = LogisticRegression(labelCol = "churn", featuresCol = "features")
pipeline = Pipeline(stages = [logreg])
paramGrid = ParamGridBuilder().addGrid(logreg.regParam, [.1, .01]).build()
crossval = CrossValidator(
estimator = pipeline,
estimatorParamMaps = paramGrid,
evaluator = BinaryClassificationEvaluator(),
numFolds = 2)
bestLogReg = crossval.fit(df_train)
When I run this, I get the following error on bestLogReg = crossval.fit(df_train)
:
IllegalArgumentException: label does not exist. Available: features, churn, CrossValidator_764038c00edc_rand, rawPrediction, probability, prediction
Here is my df_train
dataset’s schema:
root
|-- features: vector (nullable = true)
|-- churn: integer (nullable = true)
I have fit this to a LogisticRegression before and it predicts fine.
Can you help me figure out what I did wrong?
For some reason in cross validation we also need to set the label column of the evaluator (even tho it's already set for the estimator. So all you need to do is change BinaryClassificationEvaluator()
into BinaryClassificationEvaluator().setLabelCol("churn")
where "churn" is the name of your target variable.
Answered by jared3412341 on November 13, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP