[R] to match samples by minute

Zhang Weiwu zhangweiwu at realss.com
Thu Aug 15 18:31:04 CEST 2013


Perhaps this is simple and common, but it took me quite a while to admit I 
cannot solve it in a simple way.

The data frame `df` has the following columns:

    unixtime, value, factor

Now I need a matrix of:

    unixtime, value-difference-between-factor1-and-factor2

The naive solution is:

    df[df$factor == "factor1",] - df[df$factor == "factor2",]

It won't work, because factor1 has 1000 valid samples, factor2 has 1400 
valid samples. The invalid samples are dropped on-site, i.e. removed before 
piped into R.

To solve it, I got 2 ideas.

1. create a new data.frame with 24*60 records, each record represent a 
minute in the day, because sampling is done once per minute. Now fit all 
records into their 'slots' by their nearest minute.

2. pair each record with another that has similar unixtime but different 
factor.

Both ideas require for loop into individual records. It feels to C-like to 
write a program that way. Is there a professional way to do it in R? If not, 
I'd even prefer to rewrite the sampler (in C) to not to discard invalid 
samples on-site, than to mangle R.

Thanks.



More information about the R-help mailing list