How to apply class weight to a multi-output model?

Question

I have a model with 2 categorical outputs.
The first output layer can predict 2 classes: [0, 1]
and the second output layer can predict 3 classes: [0, 1, 2].

How can I apply different class weight dictionaries for each of the outputs?

For example, how could I apply the dictionary {0: 1, 1: 10} to the first output,
and {0: 5, 1: 1, 2: 10} to the second output?

I’ve tried to use the following class weights dictionary
weight_class={'output1': {0: 1, 1: 10}, 'output2': {0: 5, 1: 1, 2: 10}}
But the code fails with an error.

My script also runs normally when i remove the class_weight parameter

Code Example

I’ve created a minimal example that reproduces the error

from tensorflow.python.keras.models import Model
from tensorflow.python.keras.layers import Input, Dense
from tensorflow.python.data import Dataset
import tensorflow as tf
import numpy as np


def preprocess_sample(features, labels):
    label1, label2 = labels
    label1 = tf.one_hot(label1, 2)
    label2 = tf.one_hot(label2, 3)
    return features, (label1, label2)


batch_size = 32

num_samples = 1000
num_features = 10

features = np.random.rand(num_samples, num_features)
labels1 = np.random.randint(2, size=num_samples)
labels2 = np.random.randint(3, size=num_samples)

train = Dataset.from_tensor_slices((features, (labels1, labels2))).map(preprocess_sample).batch(batch_size).repeat()

# Model
inputs = Input(shape=(num_features, ))
output1 = Dense(2, activation='softmax', name='output1')(inputs)
output2 = Dense(3, activation='softmax', name='output2')(inputs)
model = Model(inputs, [output1, output2])

model.compile(loss='categorical_crossentropy', optimizer='adam')
class_weights = {'output1': {0: 1, 1: 10}, 'output2': {0: 5, 1: 1, 2: 10}}
model.fit(train, epochs=10, steps_per_epoch=num_samples // batch_size,
          # class_weight=class_weights
          )

This code runs successfully without the class_weight parameter.
But when you add the class_weight parameter by uncommenting the line
# class_weight=class_weights than the script fails with the following error:

Traceback (most recent call last):
  File "test.py", line 35, in <module>
    class_weight=class_weights
  File "venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1536, in fit
    validation_split=validation_split)
  File "venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 992, in _standardize_user_data
    class_weight, batch_size)
  File "venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1165, in _standardize_weights
    feed_sample_weight_modes)
  File "venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1164, in <listcomp>
    for (ref, sw, cw, mode) in zip(y, sample_weights, class_weights,
  File "venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 717, in standardize_weights
    y_classes = np.argmax(y, axis=1)
  File "venv/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1004, in argmax
    return _wrapfunc(a, 'argmax', axis=axis, out=out)
  File "venv/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 62, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "venv/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 42, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
numpy.core._internal.AxisError: axis 1 is out of bounds for array of dimension 1

Edit

I’ve also opened an issue in the Keras github page, but i wanted to ask the same question here to see if perhaps i’m missing something and doing something wrong.

beginner keras multiclass classification neural network weighted data

Gal Avineri · Answer

I wansn't able to use the class_weight parameter yet, but in the mean time i've found another way to apply class weighting to each output layer.

Current solution

In this keras issue they have supplied an easy method to apply class weights via a custom loss that implements the required class weighing.

def weighted_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

where weights is a CxC matrix (where C is the number of classes) that defines the class weights.
More precisely, weights[i, j] defines the weight for an example of class i which was falsely classified as class j.

So how do we use it?

Keras allows to assign a loss function for each output.
so we could assign each output a loss fucntion with the correct weights matrix.

For example, to satisfy the request i made in the question we could suggest the following code.

# Define the weight matrices
w1 = np.ones((2, 2))
w1[1, 0] = 10
w1[1, 1] = 10

w2 = np.ones((3, 3))
w2[0, 0] = 5
w2[0, 1] = 5
w2[0, 2] = 5
w2[2, 0] = 10
w2[2, 1] = 10
w2[2, 2] = 10

# Define the weighted loss functions
from functools import partial
loss1 = partial(weighted_categorical_crossentropy, weights=w1)
loss2 = partial(weighted_categorical_crossentropy, weights=w2)

# Finally, apply the loss functions to the outputs
model.compile(loss={'output1': loss1, 'output2': loss2}, optimizer='adam')

And that accomplishes the request :)

Edit

There is a small edition that needs to be made.
The loss functions must have a name, so we can supply this with the following:

loss1.__name__ = 'loss1'
loss2.__name__ = 'loss2'

SvGA · Answer

You have a list of outputs. You can simply pass a list of class_weight for each output as follows:

class_weight = [{0: 1, 1: 10},{0: 5, 1: 1, 2: 10}]

Muhammad Hamza Mughal · Answer

Pass a dictionary in the following format to class_weight parameter in fit_generator:
{ 'output1': {0: ratio_1 , 1: ratio_2} , 'output2': {0: ratio_3 , 1: ratio_4}}

You can use class_weight from sklearn.utils to calculate class weights from your data
EDIT: This approach works in TF 2.1.0 and earlier versions only. Thanks for replies.
References:
https://github.com/keras-team/keras/issues/4735#issuecomment-267473722
https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html

grofte · Answer

Here's my solution for sparse categorical crossentropy for a Keras model with multiple outputs in TF2. I think it looks fairly clean but it might be horrifically inefficient, idk.

First create a dictionary where the key is the name set in the output Dense layers and the value is a  1D constant tensor. The value in index 0 of the tensor is the loss weight of class 0, a value is required for all classes present in each output even if it is just 1 or 0.

Compile your model with

model.compile(optimizer=optimizer,
              loss={k: class_loss(v) for k, v in class_weights.items()})

where class_loss() is defined in the following manner

def class_loss(class_weight):
  """Returns a loss function for a specific class weight tensor

Params:
    class_weight: 1-D constant tensor of class weights

Returns:
    A loss function where each loss is scaled according to the observed class"""
  def loss(y_obs, y_pred):
    y_obs = tf.dtypes.cast(y_obs, tf.int32)
    hothot = tf.one_hot(tf.reshape(y_obs, [-1]), depth=class_weight.shape[0])
    weight = tf.math.multiply(class_weight, hothot)
    weight = tf.reduce_sum(weight, axis=-1)
    losses = tf.compat.v1.losses.sparse_softmax_cross_entropy(labels=y_obs,
                                                              logits=y_pred,
                                                              weights=weight)
    return losses
  return loss

If someone has a better suggestion than using tf.compat.v1 then please let me know. I don't feel confident that it will stick around through future versions of Tensorflow. 
I also posted this answer here: https://github.com/keras-team/keras/issues/11735#issuecomment-641775516

EDIT: Be aware that this is for an output with a linear output rather than a softmax output! You have to softmax the outputs afterwards if you want softmax values (but if you just want the predictions ranked then logits still work).

How to apply class weight to a multi-output model?

Code Example

Edit

4 Answers

Current solution

So how do we use it?

Edit

Add your own answers!

Ask a Question