What's the best classification model for this recommendation engine?

Question

I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.

My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:

0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).

My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.

I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?

What classification model would you recommend me to use? Ideally, my result should be something like:

[0.95, 0.1, 0.54, 0.3, 0.87...]

Cheers!

marco_gorelli · Answer

Without more information about your dataset, it's impossible to recommend one particular classifier over another.

If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba method.

Here's an example:

from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])

What's the best classification model for this recommendation engine?

One Answer

Add your own answers!

Ask a Question