[R] Formula for whether hat value is influential?

Gavin Simpson gavin.simpson at ucl.ac.uk
Sun Mar 9 10:53:46 CET 2008


On Sat, 2008-03-08 at 19:38 -0800, Paul Lynch wrote:
> I was wondering if someone might be able to tell me what formula R's
> influence.measures function uses for determining whether the hat value
> it computes is influential (i.e., the true/false value in the "hat"
> column of the returned is.inf data frame).  The reason I'm asking is
> that its results disagree with what I've just learned in my statistics
> class, namely that a point should be considered influential if h_ii >
> 2(k+1)/n, where k+1 is the number of parameters in the model and n is
> the number of data points.  My 2(k+1)/n value would mark at least one
> more point influential than influence.measures does for the data set
> I'm looking at.

This is R, which because it is open source, you have access to all the
source code - type influence.measures (without () )at the prompt to see
a version without any comments.

In the in-line function is.influential(), you'll find the critical
levels used. The hat values are in infmat[, k + 4], which is the last
column (where k is the number of terms in the model, inc. the intercept
if present). The relevant part of is.influential is:

infmat[, k + 4] > (3 * k)/n

So R is using (3*(k+1)) / n in your notation (in the R code k is the
number of terms in the model, *including* the intercept if present in
the model).

The function was originally in John Fox's car package that is support
software for his book Companion to Applied Regression. In that book,
IIRC, Fox uses two cut-offs for hat values or 2 or 3 times the average
hat value as indicating influential observations. R is using the upper
level here. I would check out some of the references cited in the
References section of ?influence.measures to see why this has been
chosen.

HTH

G

> 
> I am using R 2.4.1 under Windows.  (Upgrading is difficult due to
> rather severe security policies.)
> 
> Thanks,
> 
> --Paul
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list