TransWikia.com

how to improve baseline logistic regression in high dimensional binary classification problem?

Data Science Asked by haneulkim on June 2, 2021

info about dataset:

  • df.shape = (10000, 100)
  • all feature are numerical values.
  • few outliers in each column, column with most outlier have 0.7% of data as outlier.

I am trying to improve on my baseline logistic regression however I’m stuck.

baseline = LogisticRegression(solver='lbfgs', max_iter=100, penalty='l2')

Here are some approaches I’ve took and relative results

  1. Standard scaler – Logistic regression (similar)
  2. Robust scaler – Logistic regression (simliar)
  3. remove outlier(IQR method) – standard scaler – Logistic regression (worse)
  4. Standard scaler – PCA(n_component=n_comp that explain 83% variance) – Logistic regression (more worse)

All approaches seem to perform worse than baseline.

How can I improve my baseline logistic regression model or do I need to resort to nonlinear models like random forest(I’ve already tried it however it overfits)?

Thanks in advance!

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP