[R] question about linear regression and leverage

George Markomanolis george at markomanolis.com
Tue Jun 21 09:49:14 CEST 2011


Dear all,

I am new to this field and I have a question about a linear regression.
I have a dataset of around to 31000 points and I want to apply a linear
regression. The R-squared is 0.9 however when I check the diagnostic
plots I can see that there are around to 250 points with big leverage
value. As I know the points with big leverage influence a lot the fit.
If I remove these points in order to check their influence, the
R-squared of the rest points is 0.71. So I removed less than 1% of my
data and the fit is not so good. Could you please give me any advice
about this? Is it right to let these 250 points in my dataset or not?
Could I do something else? The data are measured through an experiment
so even these 250 points are real values.

Thanks a lot,
George



More information about the R-help mailing list