Cross Validated Asked by Durin on November 28, 2020
I am learning about splines from the book “The Elements of
Statistical Learning Data Mining, Inference, and Prediction” by Hastie et al. I found on page 145 that Natural cubic splines are linear beyond the boundary knots. There are $K$ knots, $xi_1, xi_2, … xi_K$ in the splines and the following is given about such a spline in the book.
Question 1: How are 4 degrees of freedom freed up? I don’t get this part.
Question 2: In the definition of $d_k(X)$ when $k=K$ then $d_K(X) = frac 0 0$. What is the author trying to do in this formula? How does this help making sure that splines are linear beyond boundary knots?
Let's start by considering ordinary cubic splines. They're cubic between every pair of knots and cubic outside the boundary knots. We start with 4df for the first cubic (left of the first boundary knot), and each knot adds one new parameter (because the continuity of cubic splines and derivatives and second derivatives adds three constraints, leaving one free parameter), making a total of $K+4$ parameters for $K$ knots.
A natural cubic spline is linear at both ends. This constrains the cubic and quadratic parts there to 0, each reducing the df by 1. That's 2 df at each of two ends of the curve, reducing $K+4$ to $K$.
Imagine you decide you can spend some total number of degrees of freedom ($p$, say) on your non-parametric curve estimate. Since imposing a natural spline uses 4 fewer degrees of freedom than an ordinary cubic spline (for the same number of knots), with those $p$ parameters you can have 4 more knots (and so 4 more parameters) to model the curve between the boundary knots.
Note that the definition for $N_{k+2}$ is for $k=1,2,...,K-2$ (since there are $K$ basis functions in all). So the last basis function in that list, $N_{K}=d_{K-2}-d_{K-1}$. So the highest $k$ needed for definitions of $d_k$ is for $k=K-1$. (That is, we don't need to try to figure out what some $d_K$ might do, since we don't use it.)
Correct answer by Glen_b on November 28, 2020
I detail the assertion: "This frees up four degrees of freedom (two constraints each in both boundary regions)" in an example with $2$ knots $xi_1, xi_2$. The related intervals are $]-infty, xi_1[$, $]xi_1, xi_2[$ and $]xi_2, +infty[$ (so there are $|I|=3$ intervals and $|I|-1=2$ knots).
For (common) cubic splines
Without regularity constraints, we have $4|I|=12$ equations:
$$mathbf{1}(X < xi_1)~~;~~mathbf{1}(X < xi_1)X~~;~~mathbf{1}(X < xi_1)X^2~~;~~mathbf{1}(X < xi_1)X^3~~;$$ $$mathbf{1}(xi_1 leq X < xi_2)~~;~~mathbf{1}(xi_1 leq X < xi_2)X~~;~~mathbf{1}(xi_1 leq X < xi_2)X^2~~;~~mathbf{1}(xi_1 leq X < xi_2)X^3~~;$$ $$mathbf{1}(xi_2 leq X)~~;~~mathbf{1}(xi_2 leq X)X~~;~~mathbf{1}(xi_2 leq X)X^2~~;~~mathbf{1}(xi_2 leq X)X^3.$$
By adding the constraints (cubic splines assumes a $mathcal{C}^r$ regularity with $r=2$ ), we need to add $(r+1)times(|I|-1) = 3times(|I|-1) = 6$ constraints on the linear coefficients.
We end up with $12-6=6$ degree of freedom.
For natural cubic splines
"A natural cubic splines adds additional constraints, namely that function is linear beyond the boundary knots."
Without regularity constraints, we have $4|I|-4=12-4$ equations (we have removed $4$ equations, $2$ each in both boundary regions because they involve quadratic and cubic polynomials):
$$mathbf{1}(X < xi_1)~~;~~mathbf{1}(X < xi_1)X~~;~~$$ $$mathbf{1}(xi_1 leq X < xi_2)~~;~~mathbf{1}(xi_1 leq X < xi_2)X~~;~~mathbf{1}(xi_1 leq X < xi_2)X^2~~;~~mathbf{1}(xi_1 leq X < xi_2)X^3~~;$$ $$mathbf{1}(xi_2 leq X)~~;~~mathbf{1}(xi_2 leq X)X.$$
The constraints are the same as before, so we still need to add $3times(|I|-1) = 6$ constraints on the linear coefficients.
We end up with $8-6=2$ degree of freedom.
Answered by ahstat on November 28, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP