TransWikia.com

Is there a safe and simple way to estimate a standard deviation for a next subset?

Data Science Asked by zina on June 8, 2021

In case I receive only standard deviation from a sensor of a value $v$ (that is btw normally distributed) each 4th minute but need to provide a standard deviation $sigma$ for each 15 minutes is there a safe way to do it.

There are two things that came into my mind:

1) One and safe way is to get the mean, generate possible values using standard deviation of the 4 minute interval for the 15 minutes period (15*60 values). Calculate the $sigma$ for this period

2) Alternatively one can naively estimate the value of $sigma$ of the next time interval based on two previous values. For example, use sigma_{20:04:00} and sigma_{20:08:00} standard deviations to estimate sigma_{20:12:00}

In case the standard deviation is increasing/descreasing in previous cases sigma_1 and sigma_2 it will increase/descreasing in the next time interval on absolute value sigma_1sigma_2

The first method can be time-consuming/computationally-consuming comparing to the second method. Though the second method may suffer on precision.

Edit 16.04: Since I’m limited in the amount of data i preferably would use only the last standard deviation and no mean data

Edit 23.04: There is one more way that bring me to the result very close to 1st way of problem solving.

Let say $sigma_i$ is based on $n$ observations while $sigma_{i+1}$ is based on $k$ observations and $k > n$. Then
$sigma^2_{i+1} = frac{(n-1) * sigma^2_i * frac{k}{n}}{k-1} $

The benefit in this case that you are not dealing with a mean value. I suppose that this solution works well only with normally distributed values.

One Answer

Basically, the two methods you are proposing are the same.

The first one is computationally more consuming, but they are the same.

In the first method you are calculating $sigma$ generating possible values of random variable with already the same $sigma$ you have had historically. This is the same as calculating $sigma$ with all the historical data you have.

In the second method you are doing a stimation with limited data, this is the correct way unless you have sufficient amount of data to estimate a GARCH model.

A GARCH model is a statistical model for time series data that describes the variance of the current error term or innovation as a function of the actual sizes of the previous time periods' error terms.

Meaning: $sigma_t^2 = w+alpha_1epsilon_{t-1}^2+...+alpha_qepsilon_{t-q}^2+beta_1sigma_{t-1}^2+...+beta_pepsilon_{t-p}^2$

This model requires a sufficient amount of data and knowledge on time series analysis, of the two options you posted, I would use the second with as much data as possible.

Answered by Juan Esteban de la Calle on June 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP