TransWikia.com

Transfer Learning Question: Extending the Functionality of a Multipose-Estimation Machine Learning Model?

Data Science Asked by Josh Sharkey on May 30, 2021

I have experimented with a number of different machine learning models used for pose estimation. Most of them output a heatmap and offsets for the detected person(s) in the image. I really like the performance/accuracy of the multipose estimation model here.

What I would like to do next is to create a model similar to this one, except it should label each pose of the person(s) detected. There are multiple different implementations caffe/pytorch/tensorflow to choose from. I’ve thought about how to approach this and I have thought of a few different ways:

  1. Create a completely new machine learning model and use the labeled output of the pose estimation model to train it.
  2. Change or add layers to the machine learning model to change the output. (Not sure how this is done)
  3. Ditch the pose estimate model and train a new model to directly estimate using raw images/labels of cropped people. This would rely on another method to detect each person.

I want to take the path of least resistance here but I also care about the time it takes to gather/process data, and most importantly the accuracy/performance of the model. Are there any experienced Machine Learning/Data Scientists who answer the following?

  • Which approach should I take? advantages/disadvantages
  • Which machine learning library offers the functions to do this.
  • My assumption is that option 1 or 2 would be more accurate than option 3. Am I correct?

One Answer

I worked in a similar kind off object detection problem.so, my suggestion is Transfer Learning.

i) The optimal way is to go with transfer learning.It allows us to build accurate models in a timesaving way.With transfer learning, instead of starting the learning process from scratch, you start from patterns that have been learned when solving a different problem. This way you leverage previous learnings and avoid starting from scratch.

A pre-trained model is a model that was trained on a large benchmark dataset to solve a problem similar to the one that we want to solve. Accordingly, due to the computational cost of training such models, it is common practice to import and use models from published literature (e.g. VGG, Inception, MobileNet).

Several pre-trained models used in transfer learning are based on large convolutional neural networks (CNN).

When you’re repurposing a pre-trained model for your own needs, you start by removing the original classifier, then you add a new classifier that fits your purposes, and finally you have to fine-tune your model according to one of three strategies:

1.Train the model.
2.Train some layers and leave the frozen.
3.Freeze the convolutional base.

Approaches for Transfer learning process:

1)Select a pre-trained model.
2)Classify your problem according to the Size-Similarity Matrix. 
3)Fine-tune your model.

ii) Tensorflow offers a vision model named as PoseNet.

PoseNet is a vision model that can be used to estimate the pose of a person in an image or video by estimating where key body joints are. From here you can freeze the model and fine tune it. orelse you can use other pretrained image model (e.g. mobilenet , resnet).

iii) The estimation directly from the raw images first request you to identify or detect the people , so it relays on another model.

Answered by Tamil Selvan S on May 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP