CNN-Regression with a variable number of outputs

Question

I want to predict several variables describing an object on an image. I can use CNN Regression to do that. But how can I do that when the number of objects on the image differs from one image to another ?
Here is an example : I have an image representing several arrows. For each arrow, I want to predict the position of the arrow tip (x1, y1) and the position of the arrow base (x2, y2). So I want to predict 4 values per arrow. For example, on the image below, there are 5 arrows so I want to predict 5x4 = 20 values. I want my algorithm to work on images having a variable number of arrows. So the number of output positions should be different for each image.

The only method that I know to predict a variable number of outputs from an image is object detection, with algorithms such as YOLO. These algorithms generally output a bounding box for each object on the image. So in my example, I could use YOLO to isolate a sub-image of each arrow, and then use a CNN-Regression model to predict the tip and base positions on each sub-image. But it takes 2 steps (so, 2 different models), and I don't think it would work well when the arrows overlap, as in my example : if the first model (object detection) outputs a sub-image with several arrow tips, the second model (CNN regression) can't know which tip/base it should look at to make a prediction.
Do you know other methods for this type of problem ?

CNN-Regression with a variable number of outputs

Add your own answers!

Ask a Question