Data Science Asked by Innodeta on April 1, 2021
I have a question about if the modeling of the output detection affects the neural nets capability.
In my case I want to train a CNN for object recognition and classification.
As an output I want to get the object classification and a bounding box where it roughly is in the image.
So my thoughts were to model the output as a sequence of vectors:
c = vector for object classification has the length of o (total count of objects to classyify) l = vector for the bounding box with x, y, width, height
so the complete sequence would look like:
Output = (c + l) x N where N is the maximum count of objects i would like to detect
Does the order of the subsequence (c + l)
have an affect on the effectiveness of the training? Do I have to specify an order, e.g the sequences get ordered by size?
Secondly I have another question, with the specific output format I got numbers in range from 0-1 in the vector c, but numbers from range from 0 -the size of the image in the bounding box vector l. Does this also affect the training if the activation is set to “relu”? Would it be better to normalize the bounding box vector to the image width and height or the input width and height? So all number have an equal range?
These questions came into my mind while labeling the dataset? So I don’t want to waste my time labeling the dataset the wrong way :/
Thank you very much for your help!
I'd say how you order your sequences or normalize data will have much less importance than where you get features for those classifier/detection vectors (where you attach those to your base CNN network).
Your idea falls into approach called "Single Shot Detector" (as opposed to region proposal/sliding windows) and I'd recommend looking at SSD (Liu et al) and YOLO (Redmon, Farhadi) models – there is much more to it to get it performing well.
Answered by Mikhail Yurasov on April 1, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP