Data Science Asked by zipline86 on December 3, 2020
I am trying to duplicate this papers feature engineering for user activity. They take 14 days of accumulated user activity and keep the parameters (2 parameters) that fit a sigmoid to it. I would like to do the same except with 7 days of activity. http://hanj.cs.illinois.edu/pdf/kdd18_cyang.pdf
They use the formula below and keep the parameters x0 and k as features.
from scipy.optimize import curve_fit
import numpy as np
def sigmoid(x, x0, k):
y = 1 / (1 + np.exp(-k*(x-x0)))
return y
I used scipy curve_fit to find these parameters as follows
ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata)), ydata, maxfev=20000)
When I had a user that had the values below, I had the following error:
ydata1 = [0,0,0,0,0,91,91]
RuntimeError: Optimal parameters not found: gtol=0.000000 is too small func(x) is orthogonal to the columns of the Jacobian to machine precision.
I noticed that if I add the method ‘dogbox’ I know longer get the error.
ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata1)), ydata1, maxfev=20000, method='dogbox')
print(ppov[0], ppov[1])
5.189237217957538 11.509279446215949
However, I played around with other values and noticed that the resulting parameters can have very different values.
For example. If I have values for that are
ydata2=[0,3,5,30,34,50,91]
ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata2)), ydata2, maxfev=20000)
print(ppov[0], ppov[1])
-24.681668846480264 118.77183210605865
However, if I add the method=’dogbox’ I get very different k and x0 parameter values.
ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata2)), ydata2, maxfev=20000, method='dogbox')
print(ppov[0], ppov[1])
0.28468096463676695 8.154477352500013
Can anybody help me with 2 things:
I read the doc about ‘dogbox’ and don’t really understand it. Can anybody explain it more simply?
The curve_fit scipy function is looping through about 100,000 users and I need to set the parameters of the curve_fit so it does not throw an error. Is using the ‘dogbox’ method okay for my purposes knowing that the parameter results seem very different between the ‘dogbox’ and default ‘lm’ method? Or, are there other arguments in the curve_fit function that I could set instead that will help me get past this error?
I can't speak to the dogbox algorithm, but the sigmoid only has range (0,1), so fitting to your example data is sure to be bad. The paper you reference presumably scales the input first.
The first example you give has a best fit that's a step function which can be approximated by the sigmoid with parameters going to infinity; so it's no surprise the algorithm wouldn't converge.
EDIT: Maybe you should try increasing the tolerances (passed as kwargs
through curve_fit
to least_squares
); your error message mentions gtol
specifically: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares
Or, if things are converging enough for your purposes, just catch and handle that error?
Answered by Ben Reiniger on December 3, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP