[R] outlier

kan Liu kan_liu1 at yahoo.com
Tue Jun 17 23:24:15 CEST 2003


Hi, many thanks for your advice. I appreciate very
much. Maybe I can make the question more clear: I want
to evaluate the correlation between two variables: one
is the actual outputs of a system, another is the
predicted values of the outputs of the system using
neural networks. When I made scatterplots in excel, I
can get the linear equation and the corresponding
R-squared. In the bottom of the page
http://www.statsoftinc.com/textbook/stathome.html, it
mentioned that sometimes outliers will affect
correlation coefficient biasly. So I thought it might
be worth to remove outlier before  calculating
R-squared in R. It seems to be a bad idea according to
your comments. Now can you make comments on how to
evaluate the performance of the neural network model
in predicting the actual outputs?

Kan 

--- Spencer Graves <spencer.graves at PDF.COM> wrote:
> 	  It is also wise to make scatterplots, as shown by
> the famous examples 
> produced of 4 scatterplots with the same R^2, where
> the first shows the 
> standard ellipsoid pattern implied by the
> assumptions while the other 
> three indicate very clearly that the assumptions are
> incorrect.  See 
> Anscombe (1973) "Graphs in Statistical Analysis",
> The American 
> Statistician, 27: 17-22, reproduced in, e.g., du
> Toit, Steyn and Stumpf 
> (1986) Graphical Exploratory Data Analysis
> (Springer).
> 
> hth.  spencer graves
> 
> Prof Brian Ripley wrote:
> > On Tue, 17 Jun 2003, kan Liu wrote:
> > 
> > 
> >> I want to calculate the R-squared between two
> variables. Can you advice
> >>me how to identify and remove the outliers before
> performing R-squared
> >>calculation?
> > 
> > 
> > Easy: you don't.  It make no sense to consider R^2
> after arbitrary outlier 
> > removal: if I remove all but two points I get R^2
> = 1!
> > 
> > R^2 is normally used to measure the success of a
> multiple regression, but 
> > as you mention two variables, did you just mean
> the Pearson 
> > product-moment correlation?  It makes more sense
> to use a robust measure 
> > of correlation, as in cov.rob (package lqs) or
> even Spearman or Kendall 
> > measures (cov.test in package ctest).
> > 
> > If you intended to do this for a multiple
> regression, you need to do some 
> > sort of robust regression and a use a robust
> measure of fit.
> > 
> 
> 


__________________________________

SBC Yahoo! DSL - Now only $29.95 per month!




More information about the R-help mailing list