Dummy vectors and performance measurement for vector search Face Recognition

Question

I have about thousands of person face (from celebrity dataset LFW), which each person represented by 512 x 1 vector. I stored it on vector DB to build face searching system using embedded feature (MTCNN for face detection and arcface for embedding model). Someone suggest me to add many vectors as "dummy faces" to the database with unknown class (the number of the vectors is larger than the personal class).
It's still unclear for me why I need to add many unknown faces as "UNKNOWN" class and put it together with thousands of vector from each person. As far as I know, its pretty easy to check the performance by get the similarity score with only from known vectors (the vectors from each person) without the unknown one, for example let said if i put k = 3 or k = 5, i will take the minimum distance as the result and get the class of the vector (ID or label).

Devashish Prasad · Accepted Answer

No, creating dummy unknowns is not the best way to do it.
A better approach can be, if a new face comes in, we calculate distance between vector of the new face and all of the vectors of known faces already present with us. And to identify the correct face, the minimum distance is considered. But this minimum distance should also be below a threshold value. If the minimum distance is above the threshold value, then it is considered as an unknown face. The threshold value is set manually.
Just to give you the idea, here is pseudo example -
Let's say you have 5 registered (known) faces. Their vectors are [1] [2] [3] [4] [5]. Assume that your model represents each face by a vector of shape 1 x 1.
You set the threshold value as 0.4
Your distance function is defined as abs(vector1 - vector2)
Scenario 1 -
A new face comes in, and your model generates a vector [2.2]
So you calculate the distance between new face and known faces as - [1.2] [0.2] [1.2] [2.2] [3.2]
Your minimum distance is [0.2] which is less than your threshold 0.4. Hence, this new face is identified as "[2] 2nd face" face.
Scenario 2 -
A new face comes in, and your model generates a vector [7.5]
So you calculate the distance between new face and known faces as - [6.5] [5.5] [4.5] [3.5] [2.5]
Your minimum distance is [2.5] which is more than your threshold 0.4. Hence, this new face is identified as an "unknown" face.
Lastly,
As you mentioned, you can use KNN based approach. Although it would produce very accurate results, it might not scale well. As number of faces in your database grows, KNN approach will slow down.

Dummy vectors and performance measurement for vector search Face Recognition

One Answer

Add your own answers!

Ask a Question