I'm working on some fairly standard regression models (linear, logistic, and poisson.) Unfortunately, the data is rather messy.
A visual inspection, using either a histogram or a density plot indicates some significant outliers. Furthermore, summary statistics of the data indicate the same thing.
If I fit a linear regression in R using the "lm" command, I can then plot the model to look at residuals, etc.
I'm interesting in re-fitting the model with a N% of the high leverage points removed. (Large data set, want to fit "most" of the data.)
Is there a computational way to get the leverage for each data point? That way I can subset the data skipping N% of the highest leverage ones.
