[R] outlier

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jun 17 18:51:32 CEST 2003


On Tue, 17 Jun 2003, kan Liu wrote:

>  I want to calculate the R-squared between two variables. Can you advice
> me how to identify and remove the outliers before performing R-squared
> calculation?

Easy: you don't.  It make no sense to consider R^2 after arbitrary outlier 
removal: if I remove all but two points I get R^2 = 1!

R^2 is normally used to measure the success of a multiple regression, but 
as you mention two variables, did you just mean the Pearson 
product-moment correlation?  It makes more sense to use a robust measure 
of correlation, as in cov.rob (package lqs) or even Spearman or Kendall 
measures (cov.test in package ctest).

If you intended to do this for a multiple regression, you need to do some 
sort of robust regression and a use a robust measure of fit.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list