How to average CDFs of one variable across years

Question

I have wealth-to-income data for 10 years. I computed the cdf of this variable in each year.
Now I'm trying to average the cdfs across years. In each each, the number of observations is different.
Does anyone know how to do it?
Thank you in advance.
Susan

Bayesian · Answer

Why don't you just take a weighted average?
Suppose you have ten years $t in {1,...,10}$ and year $t$ has $N_t$ observations such that in total you have $sum_t N_t=N$ observations. Let the year-$t$ CDF be $F_t$ with support $[underline w_t,overline w_t]$.
You can then define a weighted average CDF as
$$overline F (w) = sum_t frac{N_t}{N} F_t(w).$$
This gives you a cdf, an increasing right-continuous function ranging over [0,1] with support $cup_t [underline w_t,overline w_t]$. However, you have to pay attention to the individual supports, i.e., $F_t(w) =1 forall w>overline w_t$ and $F_t(w) =0 forall w<underline w_t$.

Alecos Papadopoulos · Answer

The answer by @Baysiean proposed to compute a weighted average of the per-period empirical distribution functions $EDF_t(w)$ (where $w$ is the value in the support of a random variable $W$), a value at which we evaluate the $EDF_t$ of $W$. Let's see what that may mean.
The $EDF_t(w)$ expression is, for each value $w$ in the support,
$$EDF_t(w) = frac 1{N_t} sum_iI{w_{t,i} leq w}.$$
Here $w_{t,i}$ is a data point from the sample in the $t$-th period. The proposed weighted average is
$$overline {EDF}(w) = sum_t frac{N_t}{N} EDF_t(w)  =  sum_t frac{N_t}{N} frac 1{N_t} sum_iI{w_{t,i} leq w} = frac 1 N sum_t sum_iI{w_{t,i} leq w},$$
which is just the pooled average over all data available and across time periods.
In other words, taking the weighted average in this case, proves to be equivalent to consider a pooled (unweighted) average over all time period samples, something that, in order to be meaningful for inference (apart from being some purely descriptive statistic for the specific sample devoid of economic/causal/structural meaning), must rely on the assumption that the distribution functions are identical period-by-period. But "taking the weighted average" appears to allow for different distributions, which is not the case, if, again, one is interested in economic inference.
What would be really interesting is to model this estimation task as a sequential Bayesian one.

How to average CDFs of one variable across years

2 Answers

Add your own answers!

Ask a Question