Data Science Asked by RWS on May 28, 2021
I have monthly snapshots (3 years) of all the contract data. It includes following information:
I also have other information about the contracts like id, name, description, etc.
Answers I am trying to get:
Problems I am having with this data:
It’s not time series data, it’s monthly snapshot, so I can either turn it into monthly time series dataset and accumulate revenues based on each status and stages or count of all the contracts.
Do I accumulate the contracts data or leave it as individual contracts? In the later case, how do I feed it to any model? It won’t be a time series data then.
Main problem with finding the right approach:
I am not sure what approaches to use to answer very different questions. Some values are categorical and some are numerical. I am not sure if it is a forecasting problem or ‘change in event’ prediction problem. Or mix of both?
How do I incorporate, these very different categorical variables with numerical revenue value, into any model.
Methods I looked into:
I have read about forecasting models like ARIMA (mostly sales data). It takes time series data to forecast revenues based on historical data. I am not sure if it is valid here because I have contracts that changes status, and I am not sure how to use it in ARIMA model. Or if it is necessary to do so. I am also not sure if there is a seasonality to the data. Contracts winning or losing is not a seasonal event.
I also looked into Simple Exponential Smoothing (SES) and Holt Winter’s Exponential Smoothing (HWES) examples and found the same issue while calculating average delays or forecasting future revenue. The current data is not univariate.
I looked at following answers: https://stats.stackexchange.com/questions/246151/difference-between-time-series-prediction-vs-point-process-prediction and it made me think that maybe my problem is not time series prediction.
Best Approach to Forecasting Numerical Value Based on time series and categorical data? : This made me think maybe I should look into RNN and LSTM.
How do you predict a continuous variable when all your independent variables are categorical : Or my problem is similar to this one.
I am sorry for the long post. I am trying to make the problem as clear as possible. I have no idea what would be the best approach to solve this problem and what data to feed to the model. I am also lost at how to structure the data to get the best use out of all the variables.
I would be grateful if you can help me suggest any good methods or reading resources, so I can answer the questions. Thank you for your time!
If you want to use neural networks this post on Kaggle might help: https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/159052
It has a short list of resources for categorical embeddings and LSTM (I think).
If you think your dataset has periodic patterns, and you only need to answer your questions (not deploy a model). I would take a look a FB Prophet: https://facebook.github.io/prophet/docs/quick_start.html#python-api
It extracts the periodic components and fits them with sine and cosine waves. You can also add additional regressors, i.e., one-hot encoded categorical variables.
Answered by FreedomToWin on May 28, 2021
For time series forecasting based on both numerical and categorical data, Light GBM has proven its value in Kaggle competitions. The winners of both the M5 competition and the Corporación Favorita Grocery Sales Forecasting competition used Light GBM.
Answered by Ruben on May 28, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP