Implementation of custom loss function invariant to batch size

Question

When Implementing custom loss function how to make it invariant to the batch size. For example lets say dice loss is being implemented. The formula for dice loss is:
$$
sum_{c}{} 1 - {DSC}_{c}\
c: text{Classes},hspace{5mm} {DSC}_{c}: text{Dice score of class }c
$$
This formula does not clarify how to deal with the batch size. It is worth noting that according to this formula, the value of loss will increase if the batch size is increased. The obvious intuition would be to normalize the loss value using batch size. I have two question in this context.

Is it theoretically valid to normalize the loss using the batch size?
How to normalize?

For example, In case of image the tensor at hand is 4D. The dice score / loss can be calculated for each image and each class resulting into a 2D tensor like below, where $n$ is no. of classes and $m$ is the batch size.

$$
begin{bmatrix}
l_{1, 1} & dots & l_{1, n}\ 
vdots & ddots & vdots\
l_{m, 1} & dots & l_{m, n}
end{bmatrix}
$$
      Now the aggregated loss can be calculated as:
$$
sum_{j=1}^{n}frac{1}{m}sum_{i=1}^{m} l_{i, j} = frac{1}{m}sum_{j=1}^{n}sum_{i=1}^{m} l_{i, j}
$$

Alternatively, we can ignore the batch size from the beginning and generate per class loss in a 1D tensor:
$$
begin{bmatrix}
l_{1}^{'} & dots & l_{n}^{'}
end{bmatrix}
$$

Accordingly, the normalized aggregated loss will be:
$$
frac{1}{m}sum_{k=1}^{n} l_{k}^{'}
$$
Is there any fundamental difference between these two approaches? Which one is correct (if any)?

Related Discussion on GitHub

Implementation of custom loss function invariant to batch size

Add your own answers!

Ask a Question