[R] Re: Re: Find Closest 5 Cases?

Sean Davis sdavis2 at mail.nih.gov
Fri Feb 13 22:45:52 CET 2004


Danny,

In the bioconductor suite (www.bioconductor.org) in the pamr package there
is a program called pamr.knnimpute that will probably at least close to what
you would like to do.

Sean
-- 
Sean Davis, M.D., Ph.D.

Clinical Fellow
National Institutes of Health
National Cancer Institute
National Human Genome Research Institute

Clinical Fellow, Johns Hopkins
Department of Pediatric Oncology
-- 



On 2/13/04 3:35 PM, "dsheuman at rogers.com" <dsheuman at rogers.com> wrote:

> Art (and group),
> 
> I'm doing this as a form of missing value analysis.  Approximately 30% of the
> cases are missing data for one variable.  To impute values for those cases,
> I'd like to match those cases that are missing the variable to all other cases
> and then take an average of those to infill.
> 
> I realize there are many methods for imputing data.  I'm not well versed on
> any in particular (expect regression and cluster analysis).  That said, given
> that I have an extensive data set already with most variables populated, I can
> find the closest observations in N-dimentional space and impute the value that
> way - by focusing on the best matches.
> 
> If there are any other thoughts on how to do this (relatively easily), I'm
> open to suggestions and being educated.
> 
> Thanks,
> 
> Danny
> 
>> From: Art Kendall <Art at DrKendall.org>
>> Date: 2004/02/13 Fri PM 02:47:00 EST
>> To: Danny Heuman <a0079454 at airnews.net>
>> Subject: Re: Find Closest 5 Cases?
>> 
>> This would be extremely compute intensive.
>> Why are you trying to do this?
>> Do the 5 percentages sum to a constant total?
>> 
>> If you tell us more about the problem and its context perhaps we can make
>> some suggestions.
>> 
>> E.g., if you could live with groups of any size that are close
>> you might try transforming the percentages to z's and applying a TWOSTEP
>> procedure.
>> 
>> If your really, really need 5, the use of cluster membership variables
>> and distances from cluster centers, could be used to limit searches, but
>> I wouldn't want to try to work it out without more info especially since
>> I do not presently have SPSS on my system so I could verify my
>> recommendations.
>> 
>> Hope this helps.
>> 
>> Art
>> Art at DrKendall.org
>> Social Research Consultants
>> University Park, MD USA
>> (301) 864-5570
>> 
>> 
>> Danny Heuman wrote:
>> 
>>> I have a need to identify for each CASE the closest (or most similar) 5
>>> other CASES (not including itself as it is automatically the closest).  I
>>> have a fairly large matrix (50000 cases by 50 vars).
>> 
>> 
>> 
>> 
>> 
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list