[R] standardization of values before call to pam() or clara()

Dylan Beaudette dylan.beaudette at gmail.com
Tue May 23 02:33:47 CEST 2006


Greetings,

Experimenting with the cluster package, and am starting to scratch my head in 
regards to the *best* way to standardize my data. Both functions can 
pre-standardize columns in a dataframe. according to the manual:

Measurements are standardized for each variable (column), by subtracting the 
variable's mean value and dividing by the variable's mean absolute deviation. 

This works well when input variables are all in the same units. When I include 
new variables with a different intrinsic range, the ones with the largest 
relative values tend to be _weighted_ . this is certainly not surprising, but 
complicates things. 

Does there exist a robust technique to effectively re-scale each of the 
variables, regardless of their intrinsic range to some set range, say from 
{0,1} ?

I have tried dividing a variable by the maximum value of that variable, but I 
am not sure if this is statistically correct. 

Any ideas, thoughts would be greatly appreciated.

Cheers,

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341



More information about the R-help mailing list