[R] natural splines

Frank E Harrell Jr fharrell at virginia.edu
Thu May 8 16:59:45 CEST 2003


On Thu, 8 May 2003 15:07:56 +0100 (BST)
iwhite at staffmail.ed.ac.uk wrote:

> Apologies if this is this too obscure for R-help.
> 
> In package splines, ns(x,,knots,intercept=TRUE) produces an n by K+2
> matrix N, the values of K+2 basis functions for the natural splines with K
> (internal) knots, evaluated at x.  It does this by first generating an
> n by K+4 matrix B of unconstrained splines, then postmultiplying B by
> H, a K+4 by K+2 representation of the nullspace of C (2 by K+4), which
> contains the 2nd derivatives of the unconstrained splines evaluated at
> the boundary knots.  E.g. see Hastie and Tibshirani, Generalized Additive
> Models, exercise 2.5, p36.  The QR decomposition is used to get H.
> 
> This can produce basis functions which, while technically correct (they
> span the K+2 dim space of natural splines), can be counterintuitive.
> E.g. equally spaced knots symmetrically placed between the data extremes
> can produce very asymmetric arrangements, with N(K+2) not the mirror image
> of N(1), for example, and considerable loss of sparseness.
> 
> This approach works for any basis B, but for B-splines, the second
> derivatives are zero for all the unconstrained basis functions, apart
> from 3 at each end. All that is required is to combine these 3 so that
> the contributions to the 2nd derivatives cancel. In other words, we
> only need to find the null space of two 2 by 3 matrices, rather than a
> 2 by K+4. If the left-most internal knots are t(1) and t(2), and the
> left-hand boundary knot is t(0), we can replace B(1...3) with
> 
> B(1)+B(2)+B(3) and [t(2)-t(0)]*B(2) + [t(1)+t(2)-2*t(0)]*B(1),
> 
> (for example), and similarly at the right-hand end.
> 
> This seems simpler and more elegant than brute force QR on the full
> matrix of derivatives. But I may have missed some reason why it can't be
> used. Perhaps it doesn't work when intercept=FALSE?
> 
> ======================================
> I.White
> ICAPB, University of Edinburgh
> Ashworth Laboratories, West Mains Road
> Edinburgh EH9 3JT
> Fax: 0131 650 6564  Tel: 0131 650 5490
> E-mail: iwhite at staffmail.ed.ac.uk
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

This is also a vote for using the truncated power basis which is extremely simple and exceptionally fast for large datasets (see the rcspline.eval function in the Hmisc package).  With modern matrix arithmetic (as in S), the collinearity of the bases produced by these simple regression splines is a moot point.
---
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat




More information about the R-help mailing list