Can't get a regression problem to converge

Question

I am working on implementing a really simple version of YOLO to learn about pytorch and building deep learning models. My dataset consists of images which have two MNIST digits placed somewhere on the image like this:

I am using a CNN to try and predict which quadrant of the image contains a digit (probability between 0 and 1) and then the center coordinate of the digit (two numbers between 0 and 1). So I have an output that is shaped 2x2x3.
It seems that I can do OK predicting the probability of a digit being contained within a coordinate, but when it comes to predicting the center coordinates of the digit, I can't seem to get anything to work. The coordinates tend to just get stuck predicting the center of each coordinate (see an example below where the yellow dots are the predictions regardless of whether or not I predicted a digit there).

My loss function for the center coordinates is MSE, and it will only use a coordinate if there is a digit available in the quadrant.
If anyone is able to give a little bit of direction or tip or they spot something wrong with my network, I would greatly appreciate it.
My code is available here: https://www.kaggle.com/funky15/mnist-object-detection-v3-yolo-w-data-import

Can't get a regression problem to converge

Add your own answers!

Ask a Question