Time series tracking queue optimization problem

Question

In order to track prices of many different products from different sources, I must optimally schedule a group of trackers dedicated to price collection (ie. collect one price at a time for each product per source). The goal is to be always aware of the current price of each product-source despite the limited number of trackers. A price may change rarely (ie. 8 hours) or be more volatile (ie. 20 minutes) depending on the product, context, source... It's not possible to fetch prices every minute of all products from each source, it would require far too much trackers in parallel and it would be worthless (many prices do not change for a while).

Here come the optimization problem. How can I optimally schedule/update the trackers' job queue in real-time in order to:

minimize the lead time between the price change and its collection by tracker, for all products (within our limited trackers constraint).
minimize the number of "prices collected albeit they didn't change".

My current simple solution is to maintain different online time series ML models (combine prediction from a local model for each product-source and a global model of all products-sources) in order to get the probability each price would change within the next X minutes. This prediction output requires the latest features available for each product-source. Its a binary classification problem based on time-series. A similar model would be to predict how much a price should change within the next X minutes, it is then a regression problem similar to the probability from the classification problem but it would prioritize important price changes.

The age of features for a product-source may be old so the prediction must be linked to their age. This age can mean that features are outdated but mainly how likely it will change soon (ie. "25 minutes ago, probability of change within 30 minutes was 90%" > "1 minute ago, probability of change within was 90%").

Then I use this simple score: (probability of change within X minutes threshold) x (age of features in minutes / X minutes threshold).

So I arrange the price collection queue as follows: Y% of prices ranked by top change score, Z% of prices ordered by oldest collection request first. Z should be as small as possible but still guarantee that old prices will always be collected. There's a min interval filter for a new collection request on a same price.

Two questions:

I am new in ML and this current simple solution (change forecast probability x forecast age / forecast timeframe) seems clunky to me. Is it ? A better simple approach ?
Y and Z are currently fixed but I plan to use one-armed bandit Epsilon-Greedy algo to optimally balance Y and Z (schedule based on forecast VS standard schedule) using the average price collected with change as the reward. Is it the right way to go ?

Thanks a lot for any help.

Xavier

Fred Andres · Answer

From my experience in the beer business I would make the following observations:

95% or more of the products in a grocery store maintain the same price for at least several months.  Therefore, I would focus only on the 5% or less items that are promoted.
Generally, promotion prices were maintained for at least five days. Price changes are commonly accompanied by feature ads or displays that take time to create.  So I wouldn't worry about price changes for less than a day.
Many retail outlets, like liquor stores in Indiana, never promote anything.  You could leave these out.

This could simplify your optimization to sampling only those stores and items that are commonly promoted.  You might also look in newspapers for feature ads.  These are usually published a day before the promotion.

Time series tracking queue optimization problem

One Answer

Add your own answers!

Ask a Question