[R] Winsorizing Multiple Variables

Karl Healey karl at psych.utoronto.ca
Fri Jan 16 21:50:57 CET 2009


Hi All,

I want to take a matrix (or data frame) and winsorize each variable.  
So I can, for example, correlate the winsorized variables.

The code below will winsorize a single vector, but when applied to  
several vectors, each ends up sorted independently in ascending order  
so that a given observation is no longer on the same row for each  
vector.

So I need to winsorize the variable but then return it to its original  
order. Or another solution that will take a data frame, wisorize each  
variable, and return a new data frame with all the variables in the  
original order.

Thanks for any help!

-Karl


#The function I'm working from

win<-function(x,tr=.2,na.rm=F){

    if(na.rm)x<-x[!is.na(x)]
    y<-sort(x)
    n<-length(x)
    ibot<-floor(tr*n)+1
    itop<-length(x)-ibot+1
    xbot<-y[ibot]
    xtop<-y[itop]
    y<-ifelse(y<=xbot,xbot,y)
    y<-ifelse(y>=xtop,xtop,y)
    win<-y
    win
}

#Produces an example data frame, ss is the observation id, vars 1-5  
are the variables I want to winzorise.

ss 
= 
c 
(1 
: 
5 
);var1 
= 
rnorm 
(5 
);var2 
= 
rnorm 
(5 
);var3 
=rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))- 
 >data
data

#Winsorizes each variable, but sorts them independently so the  
observations no longer line up.

sapply(data,win)


___________________________
M. Karl Healey
Ph.D. Student

Department of Psychology
University of Toronto
Sidney Smith Hall
100 St. George Street
Toronto, ON
M5S 3G3

karl at psych.utoronto.ca




More information about the R-help mailing list