Data Science Asked on March 18, 2021
I’m working with dates for the first time. First I knew I had to convert it to timestamps which gave me the values in "datetime64" values. But then I came to know that Linear Regression from sklearn does not accept datetime as dtype for regression. Why is that?
Also, after executing df.date = df.date.map(datetime.toordinal)
I saw that the datatype for the column is "int64". How does the sklearn LinearRegression know it is not any random int64 dataand a datetime data (considering the fact that I got the needed model after plotting)?
Linear Regression is associating any numerical (or binary, which is a particular numerical) value to a coefficient. Multiplying those values by those coefficients gives you an output, and setting the threshold, you know if the model predicts 1 or 0. (This is a brief summary, you'll find plenty of people explaining in details how it works).
If your variable is a date, you have a format, like "Year/Month/Day", and the regression doesn't know how to interpret it, since it need numerical data.
So as to use the date in a regression, you can basically take the year, as a variable. If you want to be more precise, you'll have to create your own variable, with the meaning you want (Ex : a binary variable taking 1 if the day is a Saturday or Sunday, 0 either).
Side Note : Watch out when you use dates variables, if you want to use your model to predict new, future data. For ex, if you have a variable Year, and you test your model in a Cross-Validation, it will have an impact since the trend can change with years. Moreover, if you want to predict new values input, Year will always be '2020', so this won't be that useful. Utilizing it to evaluate your model might biaise your results.
Answered by BeamsAdept on March 18, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP