Data Science Asked by stav on January 21, 2021
I’ll explain my problem through a simplified example. Let’s say I have a matrix $X$ of a single variable with the following values
x = np.zeros(10).reshape(-1,1)
x[0] = 1
x[-1] = 0.5
x
>>> array([[1. ],
[0. ],
[0. ],
[0. ],
[0. ],
[0. ],
[0. ],
[0. ],
[0. ],
[0.5]])
and a target variable $y$ with 3 classes one hot encoded as
ytest = np.zeros((10,3))
ytest[:5,0] = 1
ytest[5:,2] = 1
ytest
>>> array([[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.]])
Let’s call the classes A,B and C.
As you can see I have zero examples of the class B by design. I’ve also constructed the $X$ matrix to make it impossible to correctly predict the middle $y$ values (index 2 – 9) I would like to train a neural network to predict class B for all of these values. ie when the network is "unsure" about which class to predict, just predict B.
This to me seems like I should have different misclassification penalties.
Using the code from https://github.com/keras-team/keras/issues/2115#issuecomment-530762739
shown here
class WeightedCategoricalCrossentropy(CategoricalCrossentropy):
def __init__(self, cost_mat, name='weighted_categorical_crossentropy', **kwargs):
assert cost_mat.ndim == 2
assert cost_mat.shape[0] == cost_mat.shape[1]
super().__init__(name=name, **kwargs)
self.cost_mat = K.cast_to_floatx(cost_mat)
def __call__(self, y_true, y_pred, sample_weight=None):
assert sample_weight is None, "should only be derived from the cost matrix"
return super().__call__(
y_true=y_true,
y_pred=y_pred,
sample_weight=get_sample_weights(y_true, y_pred, self.cost_mat))
def get_sample_weights(y_true, y_pred, cost_m):
num_classes = len(cost_m)
y_pred.shape.assert_has_rank(2)
y_pred.shape[1:].assert_is_compatible_with(num_classes)
y_pred.shape.assert_is_compatible_with(y_true.shape)
y_pred = K.one_hot(K.argmax(y_pred), num_classes)
y_true_nk1 = K.expand_dims(y_true, 2)
y_pred_n1k = K.expand_dims(y_pred, 1)
cost_m_1kk = K.expand_dims(cost_m, 0)
sample_weights_nkk = cost_m_1kk * y_true_nk1 * y_pred_n1k
sample_weights_n = K.sum(sample_weights_nkk, axis=[1, 2])
return sample_weights_n
I’m training a very simple MLP as follows:
cost = np.array([[1.,0.,1.],
[1.,1.,1.],
[1.,0.,1.]])
cost = tf.constant(cost)
model = tf.keras.Sequential([
tf.keras.layers.Dense(10,activation=tf.nn.relu,input_shape=(x.shape[1],)),
tf.keras.layers.Dense(10,activation=tf.nn.relu),
tf.keras.layers.Dense(3, activation=tf.nn.softmax),
])
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=opt,loss=WeightedCategoricalCrossentropy(cost),metrics=['accuracy'])
history = model.fit(x, y, batch_size=4, epochs=1000)
I’d expect my output to look something like this, where the middle samples are predicted as class B:
array([[0.998, 0. , 0.002],
[0., 1. , 0.],
[0., 1. , 0.],
[0., 1. , 0.],
[0., 1. , 0.],
[0., 1. , 0.],
[0., 1. , 0.],
[0., 1. , 0.],
[0., 1. , 0.],
[0.003, 0. , 0.997]])
however I get this:
array([[0.998, 0. , 0.002],
[0.512, 0. , 0.488],
[0.512, 0. , 0.488],
[0.512, 0. , 0.488],
[0.512, 0. , 0.488],
[0.512, 0. , 0.488],
[0.512, 0. , 0.488],
[0.512, 0. , 0.488],
[0.512, 0. , 0.488],
[0.003, 0. , 0.997]])
Does anybody have any insight on what I might be doing wrong (i’ve tried many different cost matrices to no avail).
If there is an alternate approach to achieve the desired effect of having a class with low penalty as the default for hard to predict examples that would be very helpful also.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP