Understand the equations of quantile regression forest (Meinshausen)?

Question

I am trying to implement a quantile regression forest (https://www.jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf).
But, I have some difficulties to understand how the quantiles are computed. I will try to summarize the part of interest in order to then explain exactly what I don't understand.
Let be $n$ independent observations $(X_i, Y_i)$. A tree $T$ parametrized with a realization $theta$ of a random variable $Theta$ is denoted by $T(theta)$.

Grow $k$ trees $T(theta_t)$, $t = 1, . . . , k$, as in random forests. However, for every leaf of every tree, take note of all observations in this leaf, not just their average.
For a given $X = x$, drop $x$ down all trees. Compute the weight $omega_i(x, theta_t)$ of observation $i in {1, . . . , n}$ for every tree as in (4). Compute weight $omega_i(x)$ for every observation $i in {1, . . . , n}$ as an average over $omega_i(x, theta_t)$, $t = 1, . . . , k$, as in (5).
Compute the estimate of the distribution function as in (6) for all $y in mathbb{R}$.

Where the equations (4), (5), (6) are given below.
$$ omega_i(x, theta_t) = frac{ 1 { X_i in R(x, theta_t) } }{text{#} { j : X_j in R(x, theta_t) } }    (4)$$
$$ omega_i(x) = k^{-1} sum_{t=1}^k omega_i(x, theta_t)     (5)$$
$$ hat{F}(y|X=x) = sum_{i=1}^n omega_i (x) 1{Y_i leq y}    (6) $$
Where $R(x, theta_t)$ denotes the rectangular area corresponding to the unique leaf of the tree $T(theta_t)$ that $x$ belongs to.
I can compute (4) and (5) but I don't understand how to compute (6) and then estimate quantiles. I would also add that I don't know where all observations in leaves (first step of the algorithm) are used.
Can someone give some elements to understand this algorithm ? Any help would be appreciated.

RonsenbergVI · Accepted Answer

It's a good idea to remember what you're trying to predict and that is: $mathbb{E}[Y | X=x]$. The simplest estimate is a single tree:
$$
hat{mu}(x) = sum_{i leq n} w_i(x, theta) Y_i
$$
with:

$w_i(x, theta)$ the single tree node weights (equation 4 in the paper)
$Y_i$ your observations

As we all know this is not a great estimate (high variance among others) so he defines a better estimator (random forest) as:
$$
hat{mu}(x) = sum_{i leq n} w_i(x) Y_i
$$
where the $w_i(x)$ are averages over the multiple trees (equation 5). So what does that mean? Your best estimator for $mathbb{E}[Y | X=x]$ is $hat{mu}(x)$. What do you do if you want an estimation of a function $phi$ of $Y$? You just transform your observations of $Y$ and use the same formula, i.e.:
$$
hat{mu_{phi}}(x) = sum_{i leq n} w_i(x) phi(Y_i)
$$
which would then be an estimator of $mathbb{E}[phi(Y)|X=x]$. If you define $phi(Y) = mathbb{1}_{Y leq y}$ then you get that:
$$
hat{mu_{y}}(x) = sum_{i leq n} w_i(x) mathbb{1}_{Y_i leq y}
$$
is an estimator of $mathbb{E}[mathbb{1}_{Y leq y}|X=x] = F(y|X=x)$.
I avoided a lot of low-level details here but to be clear this relates to the delta method i.e. convergence of a function of estimator. This kind of convergence is not trivial to prove (but I guess that's why they wrote a paper about it)

Understand the equations of quantile regression forest (Meinshausen)?

One Answer

Add your own answers!

Ask a Question