[R] 2^k*r (with replications) experimental design question

Mon Nov 14 02:48:41 CET 2011

Hi Denis,

Thank you again :) what do you exactly mean with "blocking factor", that it will be like the others? I'd prefer not to treat the replicates as random but rather account for the experimental error using the replicates. 

Ahhh I see what you mean, so the experimental error will show up as the SS of my new variable "Replicate" ... great!

Thank you!
Best regards,
Giovanni

On Nov 14, 2011, at 2:38 AM, Dennis Murphy wrote:

> I'm guessing you have nine replicates of a 2^5 factorial design with a
> couple of missing values. If so, define a variable to designate the
> replicates and use it as a blocking factor in the ANOVA. If you want
> to treat the replicates as a random rather than a fixed factor, then
> look into the nlme or lme4 packages.
> 
> HTH,
> Dennis
> 
> On Sun, Nov 13, 2011 at 4:33 PM, Giovanni Azua <bravegag at gmail.com> wrote:
>> Hello,
>> 
>> I have one replication (r=1 of the 2^k*r) of a 2^k experimental design in the context of performance analysis i.e. my response variables are Throughput and Response Time. I use the "aov" function and the results look ok:
>> 
>>> str(throughput)
>> 'data.frame':   286 obs. of  7 variables:
>>  $ Time          : int  6 7 8 9 10 11 12 13 14 15 ...
>>  $ Throughput    : int  42 44 33 41 43 40 37 40 42 37 ...
>>  $ No_databases  : Factor w/ 2 levels "1","4": 1 1 1 1 1 1 1 1 1 1 ...
>>  $ Partitioning  : Factor w/ 2 levels "sharding","replication": 1 1 1 1 1 1 1 1 1 1 ...
>>  $ No_middlewares: Factor w/ 2 levels "2","4": 1 1 1 1 1 1 1 1 1 1 ...
>>  $ Queue_size    : Factor w/ 2 levels "40","100": 1 1 1 1 1 1 1 1 1 1 ...
>>  $ No_clients    : Factor w/ 1 level "128": 1 1 1 1 1 1 1 1 1 1 ...
>>> head(throughput)
>>  Time Throughput No_databases Partitioning No_middlewares Queue_size
>> 1    6         42            1     sharding              2         40
>> 2    7         44            1     sharding              2         40
>> 3    8         33            1     sharding              2         40
>> 4    9         41            1     sharding              2         40
>> 5   10         43            1     sharding              2         40
>> 6   11         40            1     sharding              2         40
>>> 
>>> throughput.aov <- aov(Throughput~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput)
>>> summary(throughput.aov)
>>                              Df    Sum Sq  Mean Sq F value    Pr(>F)
>> No_databases       1    28488651 28488651 53.4981 2.713e-12 ***
>> Partitioning            1    71687    71687  0.1346  0.713966
>> No_middlewares   1     5624454  5624454 10.5620  0.001295 **
>> Queue_size          1     50892    50892  0.0956  0.757443
>> Residuals             281 149637226   532517
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>> 
>> 
>> This is somehow what I expected and I am happy, it is saying that the Throughput is significatively affected firstly by the number of database instances and secondly by the number of middleware instances.
>> 
>> The problem is that I need to integrate multiple replications of this same 2^k so I can also account for experimental error i.e. the _r_ of 2^k*r but I can't see how to integrate the _r_ term into the data and into the aov function parameters. Can anyone advice?
>> 
>> TIA,
>> Best regards,
>> Giovanni
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>