[R] use rowSums or colSums instead of apply!

Tim Hesterberg timh at insightful.com
Wed Feb 20 00:50:43 CET 2008


There were two queries recently regarding removing
rows or columns that have all NAs.

Three respondents suggested combinations of apply() with
any() or all().

I cringe when I see apply() used unnecessarily.
Using rowSums() or colSums() is much faster, and gives more readable
code.  (Two respondents did suggest colSums for the second query.)

# original small data frame
df <- data.frame(col1=c(1:3,NA,NA,4),col2=c(7:9,NA,NA,NA),col3=c(2:4,NA,NA,4))
system.time( for(i in 1:10^4) temp <- rowSums(is.na(df)) < 3)
# .078
system.time( for(i in 1:10^4) temp <- apply(df,1,function(x)any(!is.na(x))))
# 3.33

# larger data frame
x <- matrix(runif(10^5), 10^3)
x[ runif(10^5) < .99 ] <-  NA
df2 <- data.frame(x)
system.time( for(i in 1:100) temp <- rowSums(is.na(df2)) < 100)
# .34
system.time( for(i in 1:10^4) temp <- apply(df,1,function(x)any(!is.na(x))))
# 3.34

Tim Hesterberg



More information about the R-help mailing list