[R] Tukey Kramer with ANOVA (glm)

Fri Jun 15 06:03:03 CEST 2012

On Jun 14, 2012, at 2:17 PM, Alaska_Man wrote:

>
> Dr. Winsemius,
> Really quick, a BACI is a Before-After-Control-Impact approach.  I  
> have a long time series of sea cucumber density estimates, which are  
> taken at the same location(s) through time.  Some are in areas  
> Impacted by sea otters and some are in areas Not Impacted by sea  
> otters (two levels).  Each estimate is also coded with a "B" if it  
> is Before the time sea otters showed up at the impacted sites or "A"  
> if after (Impact and Control sites are both coded with BA).

So you probably need a mixed-effects analysis because you have  
repeated measures (how many?) in the same location. (How many  
locations?)

>  BACI analyses suggest an impact if the ANOVA interaction term  
> (BA*Otter) is significant; i.e., changes in sea cucumber density  
> from before to after depend on whether sea otters are present.  I  
> log transformed the response to help normalize the data, as it has  
> many zeros.

I'm not sure that makes good sense. I don't think those are structural  
zeros. Knowing how difficult it is to find sea critters, I think you  
still have a significant possibility that one or more sea cumcumbers  
was missed in those sites with measured zero values. For one thing the  
log of 0 is not a well defined value. For another thing I think it  
inflates the impact of small numbers on the inferential statistic, And  
for a third thing, the interpretation of effects gets all confused.  
You are not really interested in a ratio effect measure, at least I  
wouldn't.  Far preferable would be to use some type of robust  
statistic to handle the inference issues and keep the estimates on a  
linear scale, perhaps using bootstrap methods. Davison and Hinkley  
have pre-post designs in their "practicals".

>  While shapiro-wilks does not suggest normality, it is a very large  
> data set and it is "approximately" normal based on graphical  
> examination. Again, the data is unbalanced, as there are many more  
> estimates at the control sites and before period.

So you should not be using aov, since that method assumes balance.  
Regression methods should be appropriate.

> With that said... I would like to perform the following pairwise  
> comparisons;  B:With Otters  v. A:With Otters,  B:Without Otters  v.  
> A:Without Otters
> I am performing other ANOVAs with different data and no interaction,  
> where I would like to perform multiple pairwise comparisons between  
> the fivelevels of a single factor.  I used the code I provided  
> previously and still received error messages.  If I can get this  
> BACI interaction problem figured out, I should manage to adjust it  
> to other models.  I recently came accross Dunnett's Modified Tukey  
> Kramer (DTK.test) and it appears to address the same issue of  
> unbalanced data and has a very straightforward script (although I am  
> not sure it lends itself to interactions?).  Is this test an  
> appropriate substitute for the glht method?
> You wrote, "When your factors are both binary, the effect estimates  
> fit nicely into a 2 x 2 table and the consideration of the single  
> contrast added by the interaction is fairly  
> simple"                           wool=='A'                   
> wool=='B'   tension=='L'       3.7179                       
> -0.4356+3.7179   tension=='M'     -0.6012+3.7179           
> 0.6281+3.7179
> This output seems to be what I am looking for; assuming that if the  
> value range for a comparison includes zero, then there is no  
> significant difference?

Those are the predicted levels of log(breaks) at various combinations  
of wool and tension. You can pretty much always create such a table  
from the coefficients in a linear model. (Since you used glm() without  
a family argument you got a linear link.)

>  Where did those values come from?

I just read them off the output of print(model) and added the  
appropriate contrasts to the baseline "Intercept" which applies to the  
wool=="A" and tension=="L" category . With R's default treatment  
contrasts, all coefficients are referenced to the Intercept, and so  
you need to add back each of the coefficients to get estimates for the  
separate groups.

> I hope this helps clear up my problem.  If you have concerns about  
> pitfalls with this approach, then I would love to hear them and I  
> can research them outside of this thread.

I would think this should be discussed with your advisor. If s/he  
thinks its appropriate to get further Internet-mediated advice, then  
you should go either to stats.stackexchange or the R-SIG-ME mailing  
list where they have better minds than mine to bring to bear on  
designs that are hierarchal.

>  This is part of a masters thesis and needs to be sound.

> Thank you very much for your time.
> Sean
>
> Date: Thu, 14 Jun 2012 10:26:58 -0700
> From: ml-node+s789695n4633417h88 at n4.nabble.com
> To: seanlarson5 at hotmail.com
> Subject: Re: Tukey Kramer with ANOVA (glm)
>
>
>
> 	
> On Jun 13, 2012, at 7:36 PM, Alaska_Man wrote:
>
>
>> Hello,
>
>>
>
>> I am performing a BACI analysis with ANOVA using the following glm:
>
>
> I admit I had no idea what a "BACI analysis" might be. Looking it up
>
> it appears to be a cross-over design and my statistical betters have
>
> sternly warned me about this regression briar patch in the past. I'm
>
> especially suspicious of the lack of any statements about the balance
>
> in the sampling in your presentation. (And for that matter the
>
> extremely sketchy statement of design.)
>
>
>>
>
>> fit1<-glm(log(Cucs_m+1)~(BA*Otter)+BA+Otter+ID+Primary, data=b1)
>
>
> I'm guessing you do not understand that BA*Otter in an R formula
>
> expands to BA + Otter + BA:Otter
>
>
>> The summary(aov(fit1)) shows significance in the interaction;
>
>> however, now I
>
>> would like to determine what combinations of BA and Otter are
>
>> significantly
>
>> different (each factor has two levels).  ID and PRIMARY substrates  
>> are
>
>> categorical and included in the model to help explain some of the
>
>> variation
>
>> in the data.  The data is unbalanced so I plan on using Tukey Kramer
>
>> post
>
>> hoc analysis.  Here is how my data is laid out, it is a fairly
>
>> substantial
>
>> data set:
>
> Editing done on original (although it proved unrevealing.)
>
>> Subdistrict  T   Year  Cucs_m  Primary Persistence   Otter
>
>> Fishing    BA         ID
>
>> 109-41,42    9  2010   0.00     sil           3      1
>
>> 1        A   109-41,42
>
>> 109-41,42   13  2010   2.75     rck           3      1
>
>> 1        A   109-41,42
>
>> 109-41,42   16  2010   2.00     rck           3      0
>
>> 1        A   109-41,42
>
>> 109-41,42   18  2010   8.25     rck           3      0
>
>> 0        B   109-41,42
>
>>
>
>> I am assuming this is an appropriate pairwise comparison analysis
>
>> and I
>
>> cannot get the code to work with my data.
>
> What does it mean to be doing "pairwise comparisons" on two-level
>
> factor variables?)
>
>
>> I am *unclear how to code it to
>
>> work with the interaction*; however, even when I attempt to use it
>
>> only for
>
>> a single factor, it does not work (see below).
>
>>
>
>> x<-aov(glm(Cucs_m~as.factor(BA),data=cuc))
>
>> glht(x, linfct=mcp(BA="Tukey"))
>
>> ....................................
>
>> Error in mcp2matrix(model, linfct = linfct) :
>
>> Variable(s) ‘BA’ have been specified in ‘linfct’ but cannot be
>
>> found in
>
>> ‘model’!
>
> I suspect the glht() function is looking for 'as.factor(BA)` in the
>
> model matrix and not finding it. If BA is not already a factor, then
>
> it would make sense to do:
>
>
> cuc$BA <- factor(cuc$BA)
>
>
> .... before any analysis. Notice that you get a warning that
>
> performing contrasts in the presence of interactions is something to
>
> be warned about. If you do not know what you are doing here (and your
>
> proposed analysis hints at that possibility), I may have set a trap
>
> for you by solving a syntactic problem but not solving a conceptual
>
> problem.
>
>
>> mod <- glm(log(breaks) ~ wool*tension, data=subset(warpbreaks,
>
> tension %in% c("L","M")))
>
>> glht(mod, linfct=mcp(tension="Tukey"))
>
>
>         General Linear Hypotheses
>
> Multiple Comparisons of Means: Tukey Contrasts
>
>
> Linear Hypotheses:
>
>            Estimate
>
> M - L == 0  -0.6012
>
>
> Warning message:
>
> In mcp2matrix(model, linfct = linfct) :
>
>   covariate interactions found -- default contrast might be
>
> inappropriate
>
> ---------------
>
> Looking at the mod-object you see that the "Estimate" above is
>
> actually NOT what you had interest in. You were presumably more
>
> interested in the contrast woolB:tensionM  whose coefficient was  
> 0.6281.
>
> ----
>
>
> Coefficients:
>
>    (Intercept)           woolB        tensionM  woolB:tensionM
>
>         3.7179         -0.4356         -0.6012          0.6281
>
> ----------
>
>
> I would have instead done something like this:
>
>
>> mod <- glm(log(breaks) ~ wool*tension, data=subset(warpbreaks,
>
> tension %in% c("L","M")))
>
>> mod2 <- glm(log(breaks) ~ wool+tension, data=subset(warpbreaks,
>
> tension %in% c("L","M")))
>
>
>> anova(mod,mod2)
>
> Analysis of Deviance Table
>
>
> Model 1: log(breaks) ~ wool * tension
>
> Model 2: log(breaks) ~ wool + tension
>
>   Resid. Df Resid. Dev Df Deviance
>
> 1        32     4.6235
>
> 2        33     5.5113 -1 -0.88777
>
>
> Now I can say that the addition of an interaction term resulted in a
>
> non-significant improvement in model fit at least when measured on the
>
> log(breaks) scale. (Note: This is quite a different result than one
>
> sees on the untransformed scale where the interaction is highly
>
> significant.) When your factors are both binary, the effect estimates
>
> fit nicely into a 2 x 2 table and the consideration of the single
>
> contrast added by the interaction is fairly simple.
>
>
>                          wool=='A'                  wool=='B'
>
>  tension=='L'         3.7179                      -0.4356+3.7179
>
>  tension=='M'     -0.6012+3.7179          0.6281+3.7179
>
>
>> Can anyone off suggestions on potential problems with my approach
>
>> and/or
>
>> script issues?
>
>
> Why was the log transformation being done? Is the desired outcome a
>
> statement about ratios?
>
>
> -- 
>
>
> David Winsemius, MD
>
> West Hartford, CT
>
>
> ______________________________________________
>
> [hidden email] mailing list
>
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> 	
> 	
>
> 	
>
> 	
> 	
> 		If you reply to this email, your message will be added to the  
> discussion below:
> 		http://r.789695.n4.nabble.com/Tukey-Kramer-with-ANOVA-glm-tp4633314p4633417.html
> 	
> 	
> 		
> 		To unsubscribe from Tukey Kramer with ANOVA (glm), click here.
>
> 		NAML
> 	 		 	   		
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Tukey-Kramer-with-ANOVA-glm-tp4633314p4633435.html
> Sent from the R help mailing list archive at Nabble.com.
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT