Imbalanced Dataset (Transformers): How to Decide on Class Weights?

Question

I'm using SimpleTranformers to train and evaluate a model.
Since the dataset I am using is severely imbalanced, it is recommended that I assign weights to each label. An example of assigning weights for SimpleTranformers is given here.
My question, however, is: How exactly do I choose what's the appropriate weight for each class? Is there a specific methodology, e.g., a formula that uses the ratio of the labels?
Follow-up question: Are the weights used for the same dataset "universal"? I.e., if I use a totally different model, can I use the same weights or should I assign different weights depending on the model.
p.s.1. If it makes any difference, I'm using roBERTa.
p.s.2. There is a similar question here, however, I believe that my question is not a duplicate because a) the attached question is about Keras where my question is about Transformers, and b) I'm also asking about general recommendations of how weight values are decided where the attached question is not.

maksym33 · Answer

I am not sure about the model you are using but I might explain what the procedure is for ML in general. You have three "vanilla" solutions for coping with unbalanced supervised dataset.

Reweighing class label so that there is the same number (calculated as sum of weights for given label) of samples per label. For example if a label with maximum number of samples has $n_{max}$ samples, and some other class has $n_i$ samples, then you would assign a weight $w_i=frac{n_{max}}{n_i}$.
Undersampling - a basic procedure that gets rids of all the additional samples so that you end up with a balanced dataset.
Oversampling - creating copies of the unbalanced classes (with less samples than the $n_{max}$)
Hope that helps,

Max

Quy Dinh · Answer

The point of setting class weights is to manipulate the loss function to put more focus on the minor label. In fact, each of the data point passed to your learning algorithm will contribute information to help your loss function. By making the weight of a minor instance bigger, you say to your loss function that it should put more focus on that particular (features, label). The most intuitive way class weights making impact this way is by multiplying the loss attributed with that observation by the corresponding weight.
So, imagine you have 2 classes in your training data. Class A with 100 observations while class B have 1000 observations. To make up for the imbalanced, you set the weight of class A to (1000 / 100 = 10 times) the weight of class B, which would be [1.0, 0.1].
In general, for multi-class problem, you would like to set class weights so that for each class:
# of observations for this class * class weight = constant A.
If you choose A = 1, then class weight for a class = 1 / # of observations for that class.
Below is quoted from doc:

weight (optional): A list of length num_labels containing the weights to assign to each label for loss calculation.

Regarding what particular way to set class weight, it's as simple as trying and evaluating what works based on your accuracy metrics.

Imbalanced Dataset (Transformers): How to Decide on Class Weights?

2 Answers

Add your own answers!

Ask a Question