[R] Reducing arrays for comparison with each other...

Mark Knecht markknecht at gmail.com
Mon Jul 13 22:11:23 CEST 2009


Hi,
   First up, thanks again for all the help I'm getting on this list.
I'm making great headway in analyzing my experimental data on an
experiment by experiment basis. No way I could have done this in the
time I've done it without your help.

   This email is partially a question about R but is also soliciting
overall guidance if anyone wants to give it. I somehow managed to get
an electrical engineering degree decades ago from the University of
California system without ever being required to take a statistics
class so in that area I'm lacking background. Maybe there's a good
subject someone can point me toward. (Keep it simple though!)

   OK, I've now got dozens of experiments sitting in individual
data.frames. In most cases the data.frames look more or less like the
example below - Experiment Start time down the left side, Experiment
Completion time across the top and results in the cells. The following
data started at 830 and is 3 minute data so the first measurement came
at 833, then 836, etc.

    EnTime  836  842   845  848   851  854  900   903
1      833 -386    0  -772 -938  -772    0 -386
2      836    0 -386     0    0     0 -246  714  -632
3      839    0    0     0    0  -386    0    0  -772
4      842    0    0  -386    0     0    0 -682     0
5      845    0    0     0    0     0    0    0     0
6      848    0    0     0    0  -386    0    0     0
7      851    0    0     0    0     0    0    0     0
8      854    0    0     0    0     0    0 -386     0
9      857    0    0     0    0     0    0    0     0
10     900    0    0     0    0     0    0    0     0
11     903    0    0     0    0     0    0

   The issue I'm thinking about now is how to compare experiments when
the time increment for each is different? This one is 3 minutes,
others are 1 minute, 5 minutes, 13 minutes, etc. My initial thought is
to try and roll things up into common sizes, like 30 minute or 60
minute chunks. Granted, I won't get the same number of measurements in
each chunk, but it's a start and would yield a data.frame that's easy
to view in the Rgui console so I'd be happy with that, but is that a
good method in terms of the statistics of the problem? should I be
paying attention to the number of experiments, maybe as a percentage
of the whole, or something else? I don't know.

   Anyway, I'd like to try rolling this data up by time increment to
start and I'm wondering if there any easy ways to do that? Taking 15
minute increments I would like to get something like

   EnTime    845      900     915
1  845      SUM     SUM     SUM
2  900      SUM     SUM     SUM
4  915      SUM     SUM     SUM

where the SUM is the sum of the pieces making up this new cell.

    I tried using subset but apparently it doesn't work on arrays like
this, telling me that EnTime>=830 isn't valid for factors. I can
certainly find ways to munge the data set, changing EnTime values but
that seems like stuff I shouldn'y be doing. that said it's pretty easy
to do I suspect. (Modulo 15, etc.)

   Is there a way to use cast() to grab time increments. Instead of
cast(Results, EnTime ~ ExTime, ...) returning by the exact value, can
cast collect values together in groups?

   Any ideas appreciated before I start banging away at doing my less
interesting methods.

Thanks,
Mark




More information about the R-help mailing list