What is a visual bag of words and how is it implemented?

Question

I'm currently working on implementing a bag of visual words in Python. I get the general gist of how it works but I can't seem to find any sources that explain it in more detail to a level where I can implement it. I'm guessing scikit learn and scikit image would come in but I can't seem to point myself in the right direction. Any help?

Tobias Würfl · Answer

The necessary information can be found on wikipedia.
"when we use more technical features such as colour histograms"
Judging from this sentence I guess you need to understand the "Codebook" generation.
First step is to extract features of patches in the image. For efficiency you only want to take patches which are interesting and calculate discriminative features on them. SIFT is one Method which performs both steps for you. It takes care of finding good spots and it calculates features on this spot.
Now you can generate your codebook. A codebook will map every possible featurevector ( after all they're just numeric vectors ) to a certain output codeword. One possibility to do this, is to use k-means for codebook generation. After you built your codebook, a vector is mapped to a code by finding the minimal distance to all the entries ( since you used k-means you can use euclidean distance ).
Now you have a complete realization of the bag of words model. You can now dive into using it for classification.
The required algorithms can be implemented using the libraries you mentioned.

What is a visual bag of words and how is it implemented?

One Answer

Add your own answers!

Ask a Question