[R] Maximizing values in subsetted dataframe

Tim Clark mudiver1200 at yahoo.com
Wed Jul 29 02:56:44 CEST 2009


Dear List,

I am trying to sub-sample some data by taking a data point every x minutes.  The data contains missing values, and I would like to take the sub-sample that maximizes the number of valid points in the sample.  I.e. minimizes the number of NA's in the data set.  

For example, given the following:

da<-seq(Sys.time(),by=1,length.out=10)
x<-c(1,2,NA,4,NA,6,NA,8,9,10)
mydata<-data.frame(da,x)

If I wanted to take a subsample every 2 seconds, I would have the following two possible answers:

answer1: 2,4,NA,8
answer2: 1,NA,NA,7

I would like a function that would choose between these and obtain the one with the fewest missing values.

In my real dataset I have multiple variables collected every second and I would like to subsample it every 5, 10, and 15 minutes.

I appreciate your help.

Tim

Tim Clark
Department of Zoology 
University of Hawaii




More information about the R-help mailing list