Data Science Asked by user89534 on September 5, 2021
I am trying to develop a basic book recommender system to get in touch with the field and start learning methods and how to prepare the data.
The Dataframe I am using is pretty plain, it has the following structure (this is a simplified example):
number type username product publishing_dt price genres
0 34 access kerrigan 130365 2019-12-10 16.99 fantasy, kids
1 1 order kerrigan 76863 2020-01-15 4.66 action, crime
2 1 order 45michael 76863 2020-01-15 4.66 action, crime
3 1 order kerrigan 76863 2020-01-15 4.66 action, crime
4 1 order 45michael 86833 2020-02-4 15.65 adventure
5 1 order 45michael 86833 2020-02-4 15.65 adventure
6 1 order 45michael 130365 2019-12-10 16.99 fantasy, kids
7 1 order alicia7 130365 2019-12-10 16.99 fantasy, kids
8 1 order alicia7 130365 2019-12-10 16.99 fantasy, kids
9 1 order john5 86833 2020-02-4 15.65 adventure
10 1 order john5 86833 2020-02-4 15.65 adventure
11 10 access 45michael 63767 2020-01-24 12.99 adventure, fantasy
12 1 order uololo 830166 2019-11-03 18.45 action, war
13 25 access 7762hc 84325 2019-11-04 25.60 romance
14 4 access adrian12 997165 2019-12-16 9.99 health, motivational
15 1 order kylemm 537077 2019-12-25 8.55 history
16 31 access yvera 76863 2020-01-15 4.66 action, crime
17 1 order kerrigan 1023897 2020-02-03 7.99 adventure, humor
18 1 order angel8 86217 2020-02-01 14.99 guide, travel
To make item-based recommendations (which only take into account metadata information) I’ve used Count Vectorizer and Cosine Similarity.
For purchase-history-based recommendations, for now, I’m also using Cosine Similarity, I am calculating a similarity score between users, the run order would be this one:
In this case, I’ve tried to apply KNN but the truth is I just couldn’t and finally discarded it, but I think that with this data Cosine Similarity is a good choice.
The last scenario I would like to contemplate is the classic “According to items you’ve seen…” (I don’t know how to properly translate that into English, but you get the idea). That’s why the number field in the Dataframe is there, it shows the number of times certain user accessed the book file but didn’t purchase it. So I’ve been thinking of various options:
Everything in Python, of course, I’m here searching for some help on where to start on this last scenario specifically, and if someone can recommend me a good tutorial or guide on how to apply KNN in similar cases to this, where I don’t have any kind of ratings (feature that I’ve been noticing it has a lot of weight on recommender systems).
For very large datasets there are more powerful methods than k-nn, try to look at what the well known companies like Google, Amazon, Netflix, etc. have published.
Since you are learning, I assume your dataset is small. You can define a graph and recommend based on all views of a user. One simple method is "user-based" recommendations. A more general perspective is that of path counting / random walk ("personalized pagerank", spectral methods, etc).
If you can read math papers, you could start with C. Cooper et al Random Walks in Recommender Systems:Exact Computation and Simulations.
For book content similarities, with a small dataset, apart from the item-based similarity itself, probably your only choice is to try to find some external metadata.
Answered by Valentas on September 5, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP