[R] Re: Re: Find Closest 5 Cases?

dsheuman@rogers.com dsheuman at rogers.com
Fri Feb 13 21:35:23 CET 2004

Art (and group),

I'm doing this as a form of missing value analysis.  Approximately 30% of the cases are missing data for one variable.  To impute values for those cases, I'd like to match those cases that are missing the variable to all other cases and then take an average of those to infill.

I realize there are many methods for imputing data.  I'm not well versed on any in particular (expect regression and cluster analysis).  That said, given that I have an extensive data set already with most variables populated, I can find the closest observations in N-dimentional space and impute the value that way - by focusing on the best matches.

If there are any other thoughts on how to do this (relatively easily), I'm open to suggestions and being educated.



> From: Art Kendall <Art at DrKendall.org>
> Date: 2004/02/13 Fri PM 02:47:00 EST
> To: Danny Heuman <a0079454 at airnews.net>
> Subject: Re: Find Closest 5 Cases?
> This would be extremely compute intensive.
> Why are you trying to do this?
> Do the 5 percentages sum to a constant total?
> If you tell us more about the problem and its context perhaps we can make some suggestions.
> E.g., if you could live with groups of any size that are close
> you might try transforming the percentages to z's and applying a TWOSTEP
> procedure.
> If your really, really need 5, the use of cluster membership variables
> and distances from cluster centers, could be used to limit searches, but
> I wouldn't want to try to work it out without more info especially since
> I do not presently have SPSS on my system so I could verify my
> recommendations.
> Hope this helps.
> Art
> Art at DrKendall.org
> Social Research Consultants
> University Park, MD USA
> (301) 864-5570
> Danny Heuman wrote:
> > I have a need to identify for each CASE the closest (or most similar) 5 
> > other CASES (not including itself as it is automatically the closest).  I 
> > have a fairly large matrix (50000 cases by 50 vars). 

More information about the R-help mailing list