Data Science Asked by Murcielago on June 10, 2021
disclaimer: I am not 100% sure that this is the appropriate place to ask this question.
Here is a little bit of context about the problem.
I have a dataset containing about 1000 products timeseries (about two years of data)
From this data I am forecasting 12 months ahead, with several prediction intervals.
To predict, I made a basic model with statsmodels.tsa.statespace.exponential_smoothing.ExponentialSmoothing
model.
I am facing an issue regarding how to store the data once the model produces results at the item level.
Here is a basic reproduction of what my script look like:
seasonal_profile_df.set_index('Id')
forecast_df = pd.DataFrame(seasonal_profile_df.index)
def winter_holts(i):
fit1 = ExponentialSmoothing(new_df.iloc[:,i], trend=True, seasonal=12).fit()
prediction_interval = fit1.get_forecast(steps=12).summary_frame(alpha=[0.10, 0.05, 0.01])
forecast = pd.DataFrame(prediction_interval)
return forecast
def holts(i):
fit1 = ExponentialSmoothing(new_df.iloc[:,i], trend=True).fit()
prediction_interval = fit1.get_forecast(steps=12).summary_frame(alpha=0.10)
forecast = pd.DataFrame(prediction_interval)
print("FORECAST",forecast)
return forecast
for i in seasonal_profile_df.index:
if seasonal_profile_df['trend'].loc[i] == "trending":
holts(i)
else:
if seasonal_profile_df['seasonality'].loc[i] == "seasonal":
winter_holts(i)
for each item, the forecasting function returns a dataframe that looks like that:
FORECAST 100221 mean mean_se mean_ci_lower mean_ci_upper
2020-07-31 -4.412599 24.526896 -44.755753 35.930555
2020-08-31 -5.848380 24.526896 -46.191534 34.494775
2020-09-30 -7.284160 24.526898 -47.627317 33.058996
2020-10-31 -8.719941 24.526900 -49.063101 31.623220
2020-11-30 -10.155721 24.526903 -50.498887 30.187445
2020-12-31 -11.591502 24.526908 -51.934676 28.751672
2021-01-31 -13.027282 24.526915 -53.370468 27.315903
2021-02-28 -14.463063 24.526924 -54.806263 25.880137
2021-03-31 -15.898844 24.526935 -56.242062 24.444375
2021-04-30 -17.334624 24.526949 -57.677865 23.008617
2021-05-31 -18.770405 24.526966 -59.113674 21.572864
2021-06-30 -20.206185 24.526986 -60.549487 20.137117
the computations results are truly an issue right now because they are the basis for deeper analysis and they will end up in a posgresql DB.
I am inexperience to this and I am wondering how to deal with the output to be the most efficient possible as I will need to manipulate them later on in the script.
You can store your data using pickle
and then load it whenever you want.
import pickle
forecast1 = [1,2,3]
forecast2 = [4,5,6,6]
pickle.dump([forecast1, forecast2], open("forecasts.p", "wb"))
forecast1, forecast2 = pickle.load(open("forecasts.p","rb"))
Or you can directly store your pandas frames as .csv:
df.to_csv("forecasts.csv, header=True, index=False)
Answered by Shahriyar Mammadli on June 10, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP