info about dataset: df.shape = (10000, 100) all feature are numerical values. few outliers in each column, column with most outlier have 0.7% of data as outlier. I am trying to improve on my baseline logistic regression however I'm stuck. baseline = LogisticRegression(solver='lbfgs', max_iter=100, penalty='l2') Here are some approaches I've took and relative results Standard scaler - Logistic regression (similar) Robust scaler - Logistic regression (simliar) remove outlier(IQR method) - standard scaler - Logistic regression (worse) Standard scaler - PCA(n_component=n_comp that explain 83% variance) - Logistic regression (more worse) All approaches seem to perform worse than baseline. How can I improve my baseline logistic regression model or do I need to resort to nonlinear models like random forest(I've already tried it however it overfits)? Thanks in advance!

how to improve baseline logistic regression in high dimensional binary classification problem?

Data Science Asked by haneulkim on June 2, 2021

info about dataset:

df.shape = (10000, 100)
all feature are numerical values.
few outliers in each column, column with most outlier have 0.7% of data as outlier.

I am trying to improve on my baseline logistic regression however I’m stuck.

baseline = LogisticRegression(solver='lbfgs', max_iter=100, penalty='l2')

Here are some approaches I’ve took and relative results

Standard scaler – Logistic regression (similar)
Robust scaler – Logistic regression (simliar)
remove outlier(IQR method) – standard scaler – Logistic regression (worse)
Standard scaler – PCA(n_component=n_comp that explain 83% variance) – Logistic regression (more worse)

All approaches seem to perform worse than baseline.

How can I improve my baseline logistic regression model or do I need to resort to nonlinear models like random forest(I’ve already tried it however it overfits)?

Thanks in advance!

classification logistic regression machine learning

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

Jon Church on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
Peter Machado on Why fry rice before boiling?
Joshua Engel on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?