Keras loss function cost matrix / misclassification penalty

Question

I'll explain my problem through a simplified example. Let's say I have a matrix $X$ of a single variable with the following values
x = np.zeros(10).reshape(-1,1)
x[0] = 1
x[-1] = 0.5
x
>>> array([[1. ],
           [0. ],
           [0. ],
           [0. ],
           [0. ],
           [0. ],
           [0. ],
           [0. ],
           [0. ],
           [0.5]])

and a target variable $y$ with 3 classes one hot encoded as
ytest = np.zeros((10,3))
ytest[:5,0] = 1
ytest[5:,2] = 1
ytest
>>> array([[1., 0., 0.],
           [1., 0., 0.],
           [1., 0., 0.],
           [1., 0., 0.],
           [1., 0., 0.],
           [0., 0., 1.],
           [0., 0., 1.],
           [0., 0., 1.],
           [0., 0., 1.],
           [0., 0., 1.]])

Let's call the classes A,B and C.
As you can see I have zero examples of the class B by design. I've also constructed the $X$ matrix to make it impossible to correctly predict the middle $y$ values (index 2 - 9) I would like to train a neural network to predict class B for all of these values. ie when the network is "unsure" about which class to predict, just predict B.
This to me seems like I should have different misclassification penalties.
Using the code from https://github.com/keras-team/keras/issues/2115#issuecomment-530762739
shown here
class WeightedCategoricalCrossentropy(CategoricalCrossentropy):
    
    def __init__(self, cost_mat, name='weighted_categorical_crossentropy', **kwargs):
        assert cost_mat.ndim == 2
        assert cost_mat.shape[0] == cost_mat.shape[1]
        
        super().__init__(name=name, **kwargs)
        self.cost_mat = K.cast_to_floatx(cost_mat)
    
    def __call__(self, y_true, y_pred, sample_weight=None):
        assert sample_weight is None, "should only be derived from the cost matrix"
      
        return super().__call__(
            y_true=y_true,
            y_pred=y_pred,
            sample_weight=get_sample_weights(y_true, y_pred, self.cost_mat))

def get_sample_weights(y_true, y_pred, cost_m):
    num_classes = len(cost_m)

y_pred.shape.assert_has_rank(2)
    y_pred.shape[1:].assert_is_compatible_with(num_classes)
    y_pred.shape.assert_is_compatible_with(y_true.shape)

y_pred = K.one_hot(K.argmax(y_pred), num_classes)

y_true_nk1 = K.expand_dims(y_true, 2)
    y_pred_n1k = K.expand_dims(y_pred, 1)
    cost_m_1kk = K.expand_dims(cost_m, 0)

sample_weights_nkk = cost_m_1kk * y_true_nk1 * y_pred_n1k
    sample_weights_n = K.sum(sample_weights_nkk, axis=[1, 2])

return sample_weights_n

I'm training a very simple MLP as follows:

cost = np.array([[1.,0.,1.],
                 [1.,1.,1.],
                 [1.,0.,1.]])

cost = tf.constant(cost)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(10,activation=tf.nn.relu,input_shape=(x.shape[1],)),
    tf.keras.layers.Dense(10,activation=tf.nn.relu),
    tf.keras.layers.Dense(3, activation=tf.nn.softmax),
])

opt = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=opt,loss=WeightedCategoricalCrossentropy(cost),metrics=['accuracy'])

history = model.fit(x, y, batch_size=4, epochs=1000)

I'd expect my output to look something like this, where the middle samples are predicted as class B:
array([[0.998, 0.   , 0.002],
       [0.,    1.   , 0.],
       [0.,    1.   , 0.],
       [0.,    1.   , 0.],
       [0.,    1.   , 0.],
       [0.,    1.   , 0.],
       [0.,    1.   , 0.],
       [0.,    1.   , 0.],
       [0.,    1.   , 0.],
       [0.003, 0.   , 0.997]])

however I get this:
array([[0.998, 0.   , 0.002],
       [0.512, 0.   , 0.488],
       [0.512, 0.   , 0.488],
       [0.512, 0.   , 0.488],
       [0.512, 0.   , 0.488],
       [0.512, 0.   , 0.488],
       [0.512, 0.   , 0.488],
       [0.512, 0.   , 0.488],
       [0.512, 0.   , 0.488],
       [0.003, 0.   , 0.997]])

Does anybody have any insight on what I might be doing wrong (i've tried many different cost matrices to no avail).
If there is an alternate approach to achieve the desired effect of having a class with low penalty as the default for hard to predict examples that would be very helpful also.

Keras loss function cost matrix / misclassification penalty

Add your own answers!

Ask a Question