TransWikia.com

Bounding Boxes in YOLO Model

Data Science Asked by Tanmay Bhatnagar on April 26, 2021

The YOLO model splits the image into smaller boxes and each box is responsible for predicting 5 bounding boxes.

My question is how does the model make these bounding boxes for every grid cell ? Does each box have a predefined offset with respect to say the center of the grid cell.

I AM NOT TALKING ABOUT THE FINAL BOUNDING BOX THAT ENCLOSES THE OBJECT
I am talking about the 5 predicting bounding boxes that are present for each grid cell.

Like for example if the smaller grid cell is located at say 50×50 (the center of it) then the bounding boxes should be at (50+5)x(50+5) or something like that

If not then how do the bounding boxes come to be ?

Paper – https://arxiv.org/pdf/1506.02640.pdf
enter image description here

2 Answers

I think Andrew Ng's explanation might help you get a better understanding of the algorithm. Scan through the playlist, it explains YOLO in a very simple way and perhaps read the paper again once you have watched the video to get a complete understanding of how things work.

Answered by Saket Kumar Singh on April 26, 2021

Andrew Ng's explanation actually covers the YOLOv2 which uses anchor boxes. YOLOv1, which is the paper you linked, does not use anchor boxes so its not exactly the same.

They key to understanding how the bounding boxes are formed is to first understand how the output is encoded. To which, I'll recommend this link: https://hackernoon.com/understanding-yolo-f5a74bbc7967

Briefly speaking, and I'll be using the example from the paper, for S=7, B=2 and C=20, our output is a 7x7x30 tensor that encodes where (bounding box coordinates) and what the objects (probability of class) are. To achieve this, we construct a fully-connected layer at the end of our CNN that will give us 7x7x30 (rather forcefully). Hence on our first forward pass, each cell will have 2 random bounding boxes. A loss is calculated. The weights of the CNN will then be adjusted according to reduce that loss (opitimisation). Then the following passes will produce bounding boxes closer to the ground truth.

Answered by Baymax Lim on April 26, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP