[R] Normalization and missing values

Jonathan Baron baron at psych.upenn.edu
Wed Apr 13 19:37:57 CEST 2005


On 04/13/05 11:36, Chris Bergstresser wrote:
 Hi all --
 
     I've got a large dataset which consists of a bunch of different
 scales, and I'm preparing to perform a cluster analysis.  I need to
 normalize the data so I can calculate the difference matrix.
     First, I didn't see a function in R which does normalization -- did
 I miss it?  What's the best way to do it?

Look at scale().  Might be what you mean.

     Second, what's the best way to deal with missing values?  Obviously,
 I could just set them to 0 (the mean of the normalized scales), but I'm
 not sure that's the best way.

Lots of ways to deal with missing data.  The ones I've found most 
helpful are in the Hmisc library, particularly transcan() and
aregImpute().  See
http://www.psych.upenn.edu/~baron/rpsych/rpsych.html#SECTION000715000000000000000
for an example of the latter.  But, in general, the "right" way
to deal with missing data depends on the assumptions you make.
As a novice, I found the following article to be helpful:

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of 
the state of the art. Psychological Methods, 7, 147-177.

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
R search page: http://finzi.psych.upenn.edu/




More information about the R-help mailing list