Data Science Asked by grpaiva on December 11, 2020
I’m not a data scientist but I’m trying to implement a recommendation engine on my company. My application runs on PHP but I’ll use Python to process this data.
My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:
0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).
My idea is to compare one single user array such as [0,1,0,0,0,1,1…] with all this data and return a grade for each course with the probability of interest for this user.
I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don’t know much) it would return me a binary result, right?
What classification model would you recommend me to use? Ideally, my result should be something like:
[0.95, 0.1, 0.54, 0.3, 0.87…]
Cheers!
Without more information about your dataset, it's impossible to recommend one particular classifier over another.
If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba
method.
Here's an example:
from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])
Answered by marco_gorelli on December 11, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP