Data Science Asked on June 14, 2021
The YouTube Faces database (YTF) consists of 3,425 videos of 1,595 different people. Given two videos, the task for YTF is to decide if they contain the same person or not. Having $n$ comparisons, the classifier might get $c leq n$ right. Then the accuracy would be $frac{c}{n}$.
FaceNet is a CNN which maps an image of a face on a unit sphere of $mathbb{R}^{128}$. It was evaluated on YTF. How did they decide which person is in the video?
(I can imagine several procedures how this could be done, but I couldn’t find it in the paper. One example, how it could be done, is by evaluating all images $x_i^{(k)}$ with $i = 1, dots, text{length of video }k$ and averaging the results – but I would like to know what they did / how this is usually done.)
The objective function they use to train the CNN minimizes the squared L2 distance (i.e. the squared Euclidean distance) between two similar (positive) images and simultaneously maximizes the distance between two different (negative) images. That means, the (squared) Euclidean distance between two representations is a measure of their similarity. Then, recognizing a face in a new image is as simple as 1) running it through the CNN and 2) finding its nearest neighbors with a KNN algorithm.
The last paragraph was only about images - in the Youtube Faces DB, we are handling videos of different persons. In section 5.7 of the paper, they describe how they evaluate performance:
We use the average similarity of all pairs of the first one hundred frames that our face detector detects in each video.
So, you were partially right: they just average the independent results over video frames. Probably for performance reasons, they chose to average the first 100 frames. They do describe that increasing this to the first 1000 frames increases performance from 95.12% to 95.18%, which is not significantly more.
Correct answer by hbaderts on June 14, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP