Cross Validated Asked by tbizzy0808 on December 4, 2020
So I am currently building a model which does a certain type of action recognition, which I am implementing as a two-stage, end-to-end system. The first stage is a pose estimation model, and I want to take the outputs from this model and use some type of sequence classification model (to be determined what architecture specifically, not important for question) for outputting the type of action. However, this is essentially based on the changes in the relative positions of different segments from the pose estimation model, which are variable to the angle that the input video was shot at. So, as I see it, I will need to either gather a much larger amount of data to generalize across angles, make sure every input video is from roughly the same angle, or normalize the pose estimation data by essentially rotating it.
The way I’m thinking about this is I could
train another network which could rotate the input video beforehand (there is some research on this, and I could potentially generate some of the training data myself by using perspective transforms), or
figure out some type of mathematical model which will give me the angle based on segment relationships to either train data for (1) or to derive some sort of transform based on this model to convert the estimated segment positions to a standard view
I was wondering if there is any other existing techniques I haven’t been able to think of or anything like that which would help me out. Any help greatly appreciated, thanks!
So I think what I am going to try to do is use 3D pose estimation with a mesh model instead, as having the 3D structure shouldn’t need normalization inherently, remaining the same no matter the angle of the input footage.
Will report on results.
Answered by tbizzy0808 on December 4, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP