[R] Fwd: question about using _apply and/or aggregate functions

Clifford Long gnolffilc at gmail.com
Mon Jun 22 18:18:02 CEST 2009


Resending, as am not sure about the original "To:" address.  Sorry for
any redundancy.
- Cliff


---------- Forwarded message ----------
From: Clifford Long <gnolffilc at gmail.com>
Date: Mon, Jun 22, 2009 at 11:04 AM
Subject: question about using _apply and/or aggregate functions
To: R-help at lists.r-project.org


Hi R-list,

I'll apologize in advance for (1) the wordiness of my note (not sure
how to avoid it) and (2) any deficiencies on my part that lead to my
difficulties.

I have an application with several stages that is meant to simulate
and explore different scenarios with respect to product sales (in
units sold per month).  My session info is at the bottom of this note.

The steps include (1) an initial linear regression model, (2) building
an ARIMA model, (3) creating 12 months of simulated 'real' data - for
various values of projected changes in slope from the linear
regression - from the ARIMA model using arima.sim with subsequent
undifferencing and appending to historical data, (3) one-step-ahead
forecasting for 12 months using the 'fixed' arima model and simulated
data, (4) taking the residuals from the forecasting (simulated minus
forecast for each of the 12 months) and applying QC charting, and (5)
counting the number (if any) of runs-rules violations detected.

The simulation-aspect calculates 100 simulations for each of 50 values of slope.

All of this seems to work fine (although I'm sure that the coding
could be improved through functions, vectorization, etc. ... ).
However, the problem I'm having is at the end where I had hoped that
things would be easier.  I want to summarize and graph the probability
of detecting a runs-rule violation vs. the amount of the shift in
slope (of logunit sales).

The output data array passed from the qcc section at the end includes:
 - the adjustment made to the slope (a multiplier)
 - the actual value of the slope
 - the iteration number of the simulation loop (within each value of slope)
 - the count of QC charting limits violations
 - the count of QC charting runs rules violations


My code is in the attached file ("generic_code.R) and my initial sales
data needed to "prime" the simulation is in the other attached file
("generic_data.csv").  The relevant section of my code is at the
bottom of the .R file after the end of the outer loop.  I've tried to
provide meaningful comments.

I've read the help files for _apply, aggregate, arrays and data types,
and have also consulted with several texts (Maindonald and Braun;
Spector; Venebles and Ripley for S-plus).  Somehow I still don't get
it.  My attempts usually result in a message like the following:

> agg.result = aggregate(simarray.part[,3], by=list4, FUN = mean)
Error in FUN(X[[1L]], ...) : arguments must have same length

But when I check the length of the arguments, they appear to match. (??)

> length(simarray.part[,3])
[1] 5000
> length(simarray.part[,4])
[1] 5000
> length(list4)
[1] 5000


I would have rather passed along a subset of the simulation/loop
output dataset, but was unsure how to save it to preserve whatever
properties that I may have missed that are causing my difficulties.
If anyone still wants to help at this point, I believe that you'll
need to load the .csv data and run the simulation (maybe reducing the
number of iterations).

Many thanks to anyone who can shed some light on my difficulties
(whether self-induced or otherwise).

Cliff



I'm using a pre-compiled binary version of R for Windows.

Session info:

> sessionInfo()
R version 2.9.0 (2009-04-17)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] qcc_1.3         forecast_1.24   tseries_0.10-18 zoo_1.5-5
[5] quadprog_1.4-11

loaded via a namespace (and not attached):
[1] grid_2.9.0      lattice_0.17-22


> Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"


More information about the R-help mailing list