[R] ANCOVA when you don't know factor levels

Liaw, Andy andy_liaw at merck.com
Wed Mar 17 14:06:11 CET 2004


This sounds like a good case for mixture regression, for which there's Fritz
Leisch's `flexmix' package.  However, I don't think flexmix has facility for
testing whether the mixture has one vs. two components.  Others on the list
surely would know more than I do.

HTH,
Andy

> From: Rob Knell
> 
> Hello people
> 
> I am doing some thinking about how to analyse data on 
> dimorphic animals  
> - where different individuals of the same species have rather 
> different  
> morphology. An example of this is that some male beetles have large  
> horns and small wings, and rely on beating the other guys up to get  
> access to mates, whereas others have smaller horns and larger wings,  
> and rely on mobility to find mates before the guys with the 
> big horns  
> turn up and beat them up. Normally what we do to see if this is  
> happening is to plot a scatter plot of horn length vs body size. If  
> it's a simple straight line relationship then no dimorphism, but if  
> it's either a line with an obvious change of slope or two separate  
> lines then you've got a dimorphic animal.
> 
> If you've just got a change of slope then it's relatively 
> easy to test  
> for significance by breakpoint regression, which will also 
> tell you the  
> body size at which the beetles switch from 'big wings, small horns'  
> strategy to the other. What is more problematic, however, is if you  
> have a dataset that is essentially two linear regressions 
> that overlap  
> in both X and Y. Here you can't do a breakpoint regression. The  
> solution that I have come up with is to put a straight line 
> between the  
> two clouds of data and to use that line to define a two-level factor  
> and fit a model with body size, the factor and the 
> interaction to the  
> horn size data. The line can then be moved up and down, and 
> the slope  
> varied, and the fit of the model for each intercept/slope 
> combination  
> compared. The best fit model can then be compared with a straight  
> linear model with an F-test, which gives you a significance 
> test, and  
> the line allows you to decide which morph a particular beetle 
> conforms  
> to. I've written some R code to do this (see below). What i 
> would like  
> to know is a) if this problem has been dealt with before 
> elsewhere, b)  
> if there's a better way to do it and c) if my R function could be  
> improved at all - it's my first attempt at such a thing, so 
> any input  
> would be really helpful. If anyone wants a sample data set 
> then I can  
> email you one.
> 
> Thanks for any input
> 
> Rob Knell
> 
> School of Biological Sciences
> Queen Mary, University of London
> 
> 'Phone +44 (0)20 7882 7720
> http://www.qmw.ac.uk/~ugbt794
> http://www.mopane.org
> 
> Here's the code:
> 
> switchline<- 
> function(X,Y,Intmin,Intmax,Intgap,Slopemin,Slopemax,Slopegap)
> 
>  
> 
> {
> 
>  
> 
> #   X       = unimodal variable (e.g. filename$bodysize)
> 
>  
> 
> #   Y       = bimodal variable (e.g. filename$hornsize)
> 
>  
> 
> #    Intmin       = Minimum switchline intercept
> 
>  
> 
> #    Slopemin = Minimum switchline slope
> 
> #    Intmax       = Maximum switchline intercept
> 
> #    Slopemax = Maximum switchline slope
> 
> #    Intgap       = Interval between each intercept
> 
> #    Slopegap = Interval between each slope
> 
>  
> 
>  
> 
>  
> 
> #   This function written by Rob Knell 2003. It is an attempt to  
> produce an
> 
> #    objective analysis for datasets of apparently dimorphic 
> characters  
> in which
> 
> #   there is overlap in both the X and Y variable. The 
> routine divides  
> the
> 
> #    individuals into one of two classes based on whether they are  
> above or below
> 
> #   a line, and fits a linear model to the Y variable with the X  
> variable as a
> 
> #    continuous explanatory variable and a single factor, "Morph",  
> which is based
> 
> #   on whether they were above the line or below it, plus an  
> interaction term.
> 
> #   The line is then adjusted, the individuals reclassified and the  
> process
> 
> #    repeated. The output is of the form "Int" - intercept of 
> the line  
> dividing
> 
> #   one morph from another, "Slp" - the slope and "Rsqr", 
> which gives  
> the R-
> 
> #    squared value for the fitted model when the division 
> into factors  
> is based on
> 
> #   that particular line.
> 
>  
> 
> #   An error message probably means that one of your lines 
> has missed  
> the data
> 
> #    altogether - in other words, all of your data points are  
> classified into a
> 
> #    single factor.
> 
> #
> 
>  
> 
>  
> 
>  
> 
> Lines.grid<-  
> expand.grid(Intercept=seq(Intmin,Intmax,Intgap),Slope=seq(Slop
> emin,Slope 
> max,Slopegap))
> 
>  
> 
> bandwidth<-length(Lines.grid$Intercept)
> 
>  
> 
>  
> 
> Lines<-matrix(nrow=bandwidth,ncol=3,data=NA)
> 
> dimnames(Lines)<-list(seq(1,bandwidth),c("Int","Slp","Rsqr"))
> 
>  
> 
> Lines[,1]<-Lines.grid$Intercept
> 
> Lines[,2]<-Lines.grid$Slope
> 
>  
> 
> for (i in 1:bandwidth)
> 
>  
> 
>     {
> 
>     temp.data<-matrix(nrow=length(X),ncol=4,data=NA)
> 
>     colnames(temp.data)<-c("X1","Y1","switchvalue","Morph")
> 
>     temp.data[,1]<-X
> 
>     temp.data[,2]<-Y
> 
>         for(j in 1:length(X))
> 
>         {
> 
>         temp.data[j,3]<-(Lines[i,1]+Lines[i,2]*temp.data[j,1])
> 
>         ifelse(temp.data[j,2]>=temp.data[j,3], temp.data[j,4]<-1,  
>         temp.data[j,4]<-2)
> 
>         }
> 
>     
> 
>     temp.data<-data.frame(temp.data)
> 
>     temp.data$Morph<-factor(temp.data$Morph)
> 
>     model<-lm(temp.data$Y1~temp.data$X1*temp.data$Morph)
> 
>     Lines[i,3]<-round(summary(model)$r.squared,6)
> 
>     }
> 
>  
> 
> print(Lines)
> 
>  
> 
> Lines<-data.frame(Lines)
> 
> Max<-max(Lines$Rsqr)
> 
> MaxI<-Lines$Int[Lines$Rsqr==Max]
> 
> MaxS<-Lines$Slp[Lines$Rsqr==Max]
> 
> Maxmat<-cbind(MaxI,MaxS)
> 
>  
> 
> print(Maxmat)
> 
> }
> 
>                        
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}




More information about the R-help mailing list