TransWikia.com

How can collaborative filtering be extended to include more features?

Data Science Asked by Nick Smith on June 20, 2021

Looking at the following:

https://realpython.com/build-recommendation-engine-collaborative-filtering/#using-python-to-build-recommenders

I can see that userID, itemID, rating are the standard features used in a collaborative filtering model. However, my question is how can I incorporate more features into the model (such as the reviewText)?

One Answer

There's a lot of different ways to embed contextual user & item features into recommendations systems. These algorithms are often called "Hybrid Recommendation Systems".

The highest performers are usually big complicated neural nets, but where I would start with is Factorization Machines.

FMs are a staple in the recsys community today, and is probably the best thing to try first if you're trying to build a hybrid recommendation model.

Some characteristics of FMs to be aware of:

  1. They perform well with sparse data
  2. FMs have linear complexity, so they train fast and once deployed they predict fast.
  3. Very flexible in terms of input data, it can take any real valued feature vector.
  4. They're much more easily interpretable than neural networks, if you know how to interpret a regression you're probably capable of figuring out what the FM is doing.
  5. They are capable of both regression (ratings) and binary classification. In my experience you can mix implicit and explicit feedback by weighting interactions differently.
  6. They have a very unique data model. You will notice that in the FM data matrix, there are at least 4 sections, the user-index, user-features, item-index, and item-features. You will sometimes see a interaction-feature section as well. The fundamental idea here is that each row is a record of all the data involved in the interaction between a user and an item (or interactions between whatever). The user & item features section will look more familiar, they are the features of the respective users and items. What may be new to you is the user and item indices. It's basically two separate one-hot-encodings for both the users and items, but you should only ever have 1 user and item on for any interaction. This will make more sense once you see the matrix visuals, I suggest starting at it for awhile to wrap your head around it.

Some additional resources/further reading:

  • Jefkine Blog
  • Berwyn's Blog
  • FastFM Python library (This is an older library, it works well and is good for out of the box performance but if this is for a long term commercial project I would suggest looking for a more modern implementation)
  • LightFM Python Library (Another FM Library that I personally prefer over FastFM)
  • RankFM Python Library (Another FM Library specifically adapted for when you only have implicit feedback and want to rank/recommend items.)

Correct answer by mkerrig on June 20, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP