TransWikia.com

What are some good methods to forecast future revenue on categorical and value based data?

Data Science Asked by RWS on May 28, 2021

I have monthly snapshots (3 years) of all the contract data. It includes following information:

  • Contract status [Categorical]: Proposed, tracked, submitted, won, lost, etc
  • Contract stages [Categorical]: Prospecting, engaged, tracking, submitted, etc.
  • Duration of contract [Date/Time] : months and years
  • Bid Start date [Date/Time]: Date (But this changes when the contracts are delayed)
  • Contract value [Numerical] : Value of the contract in local currency
  • Future revenue projection [Numerical]: Currency value breakdown of revenue for next 5 years (this value is available for all the contracts, no matter if it’s won or lost)

I also have other information about the contracts like id, name, description, etc.

Answers I am trying to get:

  • Total value of contracts that are changing status from month to month
  • Total value of contracts that are changing stages from month to month
  • Average delay of the start date of the contracts
  • Future revenue projection (5 years) based on change of status and average delay

Problems I am having with this data:

  • It’s not time series data, it’s monthly snapshot, so I can either turn it into monthly time series dataset and accumulate revenues based on each status and stages or count of all the contracts.

  • Do I accumulate the contracts data or leave it as individual contracts? In the later case, how do I feed it to any model? It won’t be a time series data then.

Main problem with finding the right approach:

  • I am not sure what approaches to use to answer very different questions. Some values are categorical and some are numerical. I am not sure if it is a forecasting problem or ‘change in event’ prediction problem. Or mix of both?

  • How do I incorporate, these very different categorical variables with numerical revenue value, into any model.

Methods I looked into:

I am sorry for the long post. I am trying to make the problem as clear as possible. I have no idea what would be the best approach to solve this problem and what data to feed to the model. I am also lost at how to structure the data to get the best use out of all the variables.

I would be grateful if you can help me suggest any good methods or reading resources, so I can answer the questions. Thank you for your time!

2 Answers

If you want to use neural networks this post on Kaggle might help: https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/159052

It has a short list of resources for categorical embeddings and LSTM (I think).

If you think your dataset has periodic patterns, and you only need to answer your questions (not deploy a model). I would take a look a FB Prophet: https://facebook.github.io/prophet/docs/quick_start.html#python-api

It extracts the periodic components and fits them with sine and cosine waves. You can also add additional regressors, i.e., one-hot encoded categorical variables.

Answered by FreedomToWin on May 28, 2021

For time series forecasting based on both numerical and categorical data, Light GBM has proven its value in Kaggle competitions. The winners of both the M5 competition and the Corporación Favorita Grocery Sales Forecasting competition used Light GBM.

Answered by Ruben on May 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP