Data Science Asked by Abhishek Niranjan on December 8, 2020
From this post, I know you can set scale_pos_weight
for an imbalanced dataset. However, for the multi-classification problem in the imbalanced dataset, I don’t quite understand how to set the weight parameter in dmatrix
.
How can I use XGBoost for an imbalanced dataset in a multi-classification problem?
As you say, scale_pos_weight
works for two classes (binary classification). weight
can be used for three or more classes. The parameter goes into the xgb.DMatrix
function and must contain one value for each observation.
Example:
library(xgboost)
data(iris)
# We'll predict Species
label = as.integer(iris$Species)-1
iris$Species = NULL
# Split the data for training and testing (75/25 split)
n = nrow(iris)
train.index = sample(n,floor(0.75*n))
# For example, pick a weight of 1.5 for label "0", 1.0 for the other Species
weights = sapply(label[train.index], function(x) {ifelse(x == 0, 1.5, 1.0)})
# Train the data using weights
xgb.train = xgb.DMatrix(data=as.matrix(iris[train.index,]), label=label[train.index], weight = weights)
A similar question can be found here.
Answered by Peurke on December 8, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP