TransWikia.com

How to set weights in multi-class classification in xgboost for imbalanced data?

Data Science Asked by Abhishek Niranjan on December 8, 2020

From this post, I know you can set scale_pos_weight for an imbalanced dataset. However, for the multi-classification problem in the imbalanced dataset, I don’t quite understand how to set the weight parameter in dmatrix.

How can I use XGBoost for an imbalanced dataset in a multi-classification problem?

One Answer

As you say, scale_pos_weight works for two classes (binary classification). weight can be used for three or more classes. The parameter goes into the xgb.DMatrix function and must contain one value for each observation.

Example:

library(xgboost)
data(iris)

# We'll predict Species
label = as.integer(iris$Species)-1
iris$Species = NULL

# Split the data for training and testing (75/25 split)
n = nrow(iris)
train.index = sample(n,floor(0.75*n))

# For example, pick a weight of 1.5 for label "0", 1.0 for the other Species
weights = sapply(label[train.index], function(x) {ifelse(x == 0, 1.5, 1.0)})

# Train the data using weights
xgb.train = xgb.DMatrix(data=as.matrix(iris[train.index,]), label=label[train.index], weight = weights)

A similar question can be found here.

Answered by Peurke on December 8, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP