TransWikia.com

Understand the equations of quantile regression forest (Meinshausen)?

Data Science Asked by kakarotto on February 23, 2021

I am trying to implement a quantile regression forest (https://www.jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf).

But, I have some difficulties to understand how the quantiles are computed. I will try to summarize the part of interest in order to then explain exactly what I don’t understand.

Let be $n$ independent observations $(X_i, Y_i)$. A tree $T$ parametrized with a realization $theta$ of a random variable $Theta$ is denoted by $T(theta)$.

  • Grow $k$ trees $T(theta_t)$, $t = 1, . . . , k$, as in random forests. However, for every leaf of every tree, take note of all observations in this leaf, not just their average.
  • For a given $X = x$, drop $x$ down all trees. Compute the weight $omega_i(x, theta_t)$ of observation $i in {1, . . . , n}$ for every tree as in (4). Compute weight $omega_i(x)$ for every observation $i in {1, . . . , n}$ as an average over $omega_i(x, theta_t)$, $t = 1, . . . , k$, as in (5).
  • Compute the estimate of the distribution function as in (6) for all $y in mathbb{R}$.

Where the equations (4), (5), (6) are given below.

$$ omega_i(x, theta_t) = frac{ 1 { X_i in R(x, theta_t) } }{text{#} { j : X_j in R(x, theta_t) } } (4)$$

$$ omega_i(x) = k^{-1} sum_{t=1}^k omega_i(x, theta_t) (5)$$

$$ hat{F}(y|X=x) = sum_{i=1}^n omega_i (x) 1{Y_i leq y} (6) $$

Where $R(x, theta_t)$ denotes the rectangular area corresponding to the unique leaf of the tree $T(theta_t)$ that $x$ belongs to.

I can compute (4) and (5) but I don’t understand how to compute (6) and then estimate quantiles. I would also add that I don’t know where all observations in leaves (first step of the algorithm) are used.

Can someone give some elements to understand this algorithm ? Any help would be appreciated.

One Answer

It's a good idea to remember what you're trying to predict and that is: $mathbb{E}[Y | X=x]$. The simplest estimate is a single tree:

$$ hat{mu}(x) = sum_{i leq n} w_i(x, theta) Y_i $$

with:

  • $w_i(x, theta)$ the single tree node weights (equation 4 in the paper)
  • $Y_i$ your observations

As we all know this is not a great estimate (high variance among others) so he defines a better estimator (random forest) as:

$$ hat{mu}(x) = sum_{i leq n} w_i(x) Y_i $$

where the $w_i(x)$ are averages over the multiple trees (equation 5). So what does that mean? Your best estimator for $mathbb{E}[Y | X=x]$ is $hat{mu}(x)$. What do you do if you want an estimation of a function $phi$ of $Y$? You just transform your observations of $Y$ and use the same formula, i.e.:

$$ hat{mu_{phi}}(x) = sum_{i leq n} w_i(x) phi(Y_i) $$

which would then be an estimator of $mathbb{E}[phi(Y)|X=x]$. If you define $phi(Y) = mathbb{1}_{Y leq y}$ then you get that:

$$ hat{mu_{y}}(x) = sum_{i leq n} w_i(x) mathbb{1}_{Y_i leq y} $$

is an estimator of $mathbb{E}[mathbb{1}_{Y leq y}|X=x] = F(y|X=x)$.

I avoided a lot of low-level details here but to be clear this relates to the delta method i.e. convergence of a function of estimator. This kind of convergence is not trivial to prove (but I guess that's why they wrote a paper about it)

Correct answer by RonsenbergVI on February 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP