[R] Help with aggregate and cor

James Marca jmarca at translab.its.uci.edu
Wed Mar 10 03:36:58 CET 2010


Hello,

I do not understand the correct way to approach the following problem
in R.

I have observations of pairs of variables, v1, o1, v2, o2, etc,
observed every 30 seconds.  What I would like to do is compute the
correlation matrix, but not for all my data, just for, say 5 minutes
or 1 hour chunks.

In sql, what I would say is 

    select id, date_trunc('hour'::text, ts) as tshour, corr(n1,o1) as corr1
    from raw30s 
    where id = 1201087  and 
          (ts between 'Mar 1, 2007' and 'Apr 1, 2007')
    group by id,tshour order by id,tshour;


I've pulled data from PostgreSQL into R, and have a dataframe
containing a timestamp column, v, and o (both numeric).

I created an grouping index for every 5 minutes along these lines:

    obsfivemin <- trunc(obsts,units="hours")
                   +( floor( (obsts$min / 5 ) ) * 5 * 60 )

(where obsts is the sql timestamp converted into a DateTime object)

Then I tried aggregate(df,by=obsfivemin,cor), but that seemed to pass
just a single column at a time to cor, not the entire data frame.  It
worked for mean and sum, but not cor.

In desperation, I tried looping over the different 5 minute levels and
computing cor, but I'm so R-clueless I couldn't even figure out how to
assign to a variable inside of that loop!

code such as 

    for (f in fivemin){
        output[f] <- cor(df[grouper==f,]); }

failed, as I couldn't figure out how to initialize output so that
output[f] would accept the output of cor.

Any help or steering towards the proper R-way would be appreciated.

Regards,

James Marca

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the R-help mailing list