How to have Multiple labels in a single video?

Question

I am building a Tennis stroke classification system using CNN.
I assume each stroke contains 3 steps/classes ('Ready', 'Impact', 'Finish'). I want to train a model which will predict whether the input video contains these steps/classes in it.
I have tried training 3 models for each step as binary classification.
Example of one step model classes:
1 - ready  
0 - not-ready(other incorrect steps).

But this method failed since there are more features in 'not-ready' class. I got only 4% accuracy.
Can anyone help me to find a solution for this problem.

Erwan · Answer

Given that you have only 3 classes and that they closely depend on each other, I think it's worth trying a multiclass setting as WBM said. The idea is to label each video using the full combination of actions, since the maximum number of combinations is 2^3 = 8:

R-I-F
R-I
R-F
R
I-F
I
F
none

Probably some combinations of actions are impossible, so the number of classes is likely less than 8. Why this is a reasonable approach:

The setup is exactly the same, i.e. you can use the same labels and the predictions can be used the same way as in your multi-label approach
This is a "joint model", i.e. a model which learns everything together and therefore can exploit fine-grained distinction between classes (e.g. between R-I-F and R-I).

However note that this kind of method may require more data, in particular it needs to have enough instances for each class.

How to have Multiple labels in a single video?

One Answer

Add your own answers!

Ask a Question