[R] handling of missing values in aov/lm

R.J.V. Bertin rjvbertin at vip.webmails.com
Fri Jun 28 17:02:27 CEST 2002

R provides a few ways of handling missing values, a.o. in the context of an 
anova (aov); 2 types of exclusion, and failure.

In some situations, I personally like to have missing values replaced by 
the mean (or the median) for the given combination of factors.
A routine that does that is something like the code included below. It 
works, but is (of course) rather slow. It would be much quicker if sapply() 
could be used -- and I imagine that somewhere in the "innards" of aov or lm 
the data will have been broken up by factors such that sapply could be 
applied. Is there a good (statistical or other) reason why there is no such 
option? And alternatively, is there a more efficient solution than my code 

Thanks again,

RJV Bertin

NB: return address not valid; use  r j v b e r t i n  at  h o t m a i l  
dot  c o m

df.Missing.Mean.VV1 <- function(df,verbose=F)
{ ## replace missing values in the dataframe df by the mean of the 
corresponding column for each combination of the factors that interest us 
  ## have to find a more elegant fashion to find the factor columns!
       ## construct an array to receive the means for each combination of 
the relevant factors:
     replval<-rep(NA, Subjects*Types*sizes*Modalities)
     for( i in 1:ncol(df) ){
          if( !is.na(m) ){
               for( T in 1:Types ){
                    for( S in 1:sizes ){
                         for( M in 1:Modalities ){
                              m <- mean( df[,i][ nT==T & nS==S & nM==M ], 
na.rm=T )
                                ## subject-dependency should be redundant!
                              for( SS in 1:Subjects ){
                                   replval[SS,T,S,M] <- m
               for( j in 1:length(df[,i]) ){
                    if( is.na(df[,i][j]) ){
                         SS<-nSS[j] ; T<-nT[j] ; S<-nS[j] ; M<-nM[j]
                         if( verbose ){
                              print( paste( "df[,", i, ",", j, "] == NA <-",
#                                       "mean(Snr=", SS, ",T=", T, 
                                        "mean(Snr=", df$Snr[j], ",T=", 
df$Type[j], ",S=",df$size[j],",M=",df$Modality[j],")==",
                              sep="" ))

This mail sent through IMP: http://horde.org/imp/

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list