Recommendations based on other products seen

Question

I am trying to develop a basic book recommender system to get in touch with the field and start learning methods and how to prepare the data.

The Dataframe I am using is pretty plain, it has the following structure (this is a simplified example):

number    type   username  product  publishing_dt   price   genres  
0       34  access   kerrigan   130365  2019-12-10      16.99   fantasy, kids
1        1   order   kerrigan    76863  2020-01-15      4.66    action, crime
2        1   order  45michael    76863  2020-01-15      4.66    action, crime   
3        1   order   kerrigan    76863  2020-01-15      4.66    action, crime
4        1   order  45michael    86833  2020-02-4       15.65   adventure
5        1   order  45michael    86833  2020-02-4       15.65   adventure
6        1   order  45michael   130365  2019-12-10      16.99   fantasy, kids
7        1   order    alicia7   130365  2019-12-10      16.99   fantasy, kids
8        1   order    alicia7   130365  2019-12-10      16.99   fantasy, kids
9        1   order      john5    86833  2020-02-4       15.65   adventure
10       1   order      john5    86833  2020-02-4       15.65   adventure   
11      10  access  45michael    63767  2020-01-24      12.99   adventure, fantasy
12       1   order     uololo   830166  2019-11-03      18.45   action, war
13      25  access     7762hc    84325  2019-11-04      25.60   romance
14       4  access   adrian12   997165  2019-12-16      9.99    health, motivational
15       1   order     kylemm   537077  2019-12-25      8.55    history
16      31  access      yvera    76863  2020-01-15      4.66    action, crime
17       1   order   kerrigan  1023897  2020-02-03      7.99    adventure, humor
18       1   order     angel8    86217  2020-02-01      14.99   guide, travel

To make item-based recommendations (which only take into account metadata information) I've used Count Vectorizer and Cosine Similarity.

For purchase-history-based recommendations, for now, I'm also using Cosine Similarity, I am calculating a similarity score between users, the run order would be this one:

Calculate the most similar users to the one we want to get
recommendations for.
From those users, get their purchase history. 
Delete from that list those books the user has already bought.
Return the final list with the recommendations.

In this case, I've tried to apply KNN but the truth is I just couldn't and finally discarded it, but I think that with this data Cosine Similarity is a good choice.

The last scenario I would like to contemplate is the classic "According to items you've seen..." (I don't know how to properly translate that into English, but you get the idea). That's why the number field in the Dataframe is there, it shows the number of times certain user accessed the book file but didn't purchase it. So I've been thinking of various options:

From all the books the user has accessed, take, for example, a random one and return new recommendations based on that one. This could end up being a pretty bad choice as people can make clicks on certain items without even being interested in them.
Search for similarities between the books the user has been seeing
and search for new books with those characteristics. This seems like the correct choice, but I don't really know how to take it into action, it looks like a KNN approach.
Simply show the user those items he has been seeing, ordered by
number of accesses (the more he accessed the file, the more he is
interested in it).

Everything in Python, of course, I'm here searching for some help on where to start on this last scenario specifically, and if someone can recommend me a good tutorial or guide on how to apply KNN in similar cases to this, where I don't have any kind of ratings (feature that I've been noticing it has a lot of weight on recommender systems).

Valentas · Answer

For very large datasets there are more powerful methods than k-nn, try to look at what the well known companies like Google, Amazon, Netflix, etc. have published.

Since you are learning, I assume your dataset is small. You can define a graph and recommend based on all views of a user. One simple method is "user-based" recommendations. A more general perspective is that of path counting / random walk ("personalized pagerank", spectral methods, etc).

If you can read math papers, you could start with C. Cooper et al Random Walks in Recommender Systems:Exact Computation and Simulations.

For book content similarities, with a small dataset, apart from the item-based similarity itself, probably your only choice is to try to find some external metadata.

Recommendations based on other products seen

One Answer

Add your own answers!

Ask a Question