[R] question about capscale (vegan)

Jari Oksanen jarioksa at sun3.oulu.fi
Fri Nov 17 15:09:39 CET 2006


On Fri, 2006-11-17 at 12:26 +0000, Gavin Simpson wrote:
> On Fri, 2006-11-17 at 12:18 +0100, Alicia Amadoz wrote:
> > Hello,
> > 
> > Thank you for your help. I have tried to perform the analysis I wanted
> > with data of example, I mean not real data because I can't provide it
> > here. So, what I have tried is this,
> 
> Hi Alicia,
> 
> It would have been more helpful if you'd included the actual commands to
> generate each object, but thanks for including an example.
> 
> dat <- matrix(c(0.00,0.13,0.59,0.13,0.00,0.55,0.59,0.55,0.00), ncol = 3)
> dist.mat <- as.dist(dat)
> dist.mat
>    1    2
> 2 0.13
> 3 0.59 0.55
> time <- as.factor(c(2006, 2005, 2005))
> region <- as.factor(c("europe", "africa", "europe"))
> city <- as.factor(c("london", "nairobi", "paris"))
> factors.frame <- data.frame(time, region, city)
> 
> my.cap <- capscale(dist.mat ~ time + region + time:region +
> region:city + time:region:city, factors.frame)
> 
> my.cap
> 
> So, stop here. Look at the output. You can extract 2 constrained axes
> that explain 100% of the variance in your data. This causes my.cap$CA to
> be NULL, which is why when you do:
> 
> anova(my.cap)
> 
> You get this error message:
> 
> Error in `names<-.default`(`*tmp*`, value = "Residual") :
>         attempt to set an attribute on NULL
> 
> The error has nothing to do with providing "comm" or not (I think) as I
> don't see how this would alter my.cap$CA, and anyway, "comm" is used to
> generate "species" scores and if you look at summary(my.cap) you will
> see that you have species scores (though their meaning may be hard to
> understand if no "comm" provided - see ?capscale)
> 
> I hesitate to call this a bug in capscale() or permutest.cca() (this is
> where the error comes from by the way:
> 
> > traceback()
> 5: `names<-.default`(`*tmp*`, value = "Residual")
> 4: `names<-`(`*tmp*`, value = "Residual")
> 3: permutest.cca(object, step, ...)
> 2: anova.cca(my.cap)
> 1: anova(my.cap)
> 
> ), but anova.cca doesn't seem to handle situations where there isn't an
> unconstrained component. I've CC'd Jari Oksanen, the author of vegan to
> insure he sees this.
> 
Dear y'all,

I agree with this analysis: you have no residual (unconstrained)
variation and this means that you cannot have a significance test. I
have always known this, but I haven't cared about this issue: you ask
for an impossible analysis and get an error message. The only thing that
could be called as a bug is the text of the error message, and I may
change that. After this you still cannot perform anova when there is no
residual variation, but the error message would change. 

You have two roads to go if you still want to have an analysis like
this:
1. Like Gavin suggested, just reduce the number of constraints so that
your model has an unconstrained component, and you will be able to run
the tests.
2. Perform an unconstrained analysis (cmdscale, prcomp, princomp, or rda
in this case), fit the environmental variables to this solution and
analyses the significances of fitted vectors. This all is is doable
using envfit() function in vegan.

Cheers, jari oksanen

> This error is related to the specific dummy problem you sent - do you
> get this error when you run the analysis on your full data set? If so,
> you might want to consider removing some constraints as your model isn't
> really constrained anymore. As number constraints approaches number
> sites the constraint on the ordination drops away and you are back to a
> Principal Coordinates Analysis (IIRC) of your dissimilarity matrix.
> 
> > > anova(my.cap)
> > Erro en `names<-.default`(`*tmp*`, value = "Residual") :
> >         se intenta especificar un atributo en un NULL
> > 
> > Then, I am still concerned about 'comm' argument since I don't
> > understand how important could it be for my type of data and I don't
> > understand to what it referes in my data. Another thing, is that what I
> > am really interested in is to perform a factorial anova with another
> > factor nested (the model I have provided above), and as you can see R
> > gives an error that I don't understand either.
> 
> As for your original data - by the looks of it, you wouldn't be able to
> use that as the argument to "comm". It would need to be numeric and
> recoded etc. before you could use it, and how to do that in the best way
> I'm not sure.
> 
> But in this instance, if you are interested in the samples and how they
> relate to one another, constrained by your factors_frame, then you don't
> need "comm" and you can proceed without it, and not bother displaying
> species scores.
> 
> If you are interested in how the samples relate to one another and how
> the nucleic acids relate to one another and the samples, constrained by
> your factors_frame, then you will need to recode that example matrix
> into something numeric, and even then it may not be possible with the
> way capscale is written.
> 
> Hope this helps,
> 
> G
> 
> > 
> > Thank you for your help in advance. 
> > Regards,
> > Alicia
> > 
> > 
> > > On Thu, 2006-11-16 at 17:25 +0100, Alicia Amadoz wrote:
> > > > Hello,
> > > > 
> > > > I am interested in using the capscale function of vegan package of R. I
> > > > already have a dissimilarity matrix and I am intended to use it as
> > > > 'distance' argument. But then, I don't know what kind of data must be in
> > > > 'comm' argument. I don't understand what type of data must be referred
> > > > as 'species scores' and 'community data frame' since my data refer to
> > > > nucleic distances between different sequences.
> > > 
> > > No, that is all wrong. Read ?capscale more closely! It says that you
> > > need to use the formula to describe the model. "distance" is used to
> > > tell capscale which distance coefficient to use if the LHS of the model
> > > formula is a community matrix.
> > > 
> > > Argument "comm" is used to tell capscale where to find the species
> > > matrix that will be used to determine species scores in the analysis,
> > > *if* the LHS of the formula is a distance matrix. "comm" isn't used if
> > > the LHS is a data frame, and "distance" is ignored if the LHS is a
> > > distance matrix.
> > > 
> > > As you don't provide a reproducible example of your problem, I will use
> > > the inbuilt example from ?capscale
> > > 
> > > ## load some data
> > > data(varespec)
> > > data(varechem)
> > > 
> > > Now if you want to fit a capscale model using the raw species data, then
> > > you would describe the model as so:
> > > 
> > > vare.cap <- capscale(varespec ~ N + P + K + Condition(Al), 
> > >                      data = varechem,
> > >                      distance = "bray")
> > > vare.cap
> > > 
> > > In the above, LHS of formula is a data frame so capscale looks to
> > > argument "distance" for the name of the coefficient to turn it into a
> > > distance matrix. The terms on the RHS of the formula are variables
> > > looked up in the object assigned to the "data" argument.
> > > 
> > > Now lets alter this to start with a dissimilarity/distance matrix
> > > instead. The exact complement of the above would be:
> > > 
> > > dist.mat <- vegdist(varespec, method = "bray")
> > > vare.cap2 <- capscale(dist.mat ~ N + P + K + Condition(Al), 
> > >                      data = varechem,
> > >                      comm = varespec)
> > > vare.cap2
> > > 
> > > To explain the above example; first create the Bray Curtis distance
> > > matrix (dist.mat). Then use this on the LHS of the formula. When
> > > capscale now wants to calculate the species scores of the analysis it
> > > will look to argument "comm" to use in the calculation; which in this
> > > case we specify is the original species matrix varespec.
> > > 
> > > As for what are species scores, well this is a throw back to the origins
> > > of the package and the methods included - all of this is related to
> > > ecology and mainly vegetation analysis (hence vegan).
> > > 
> > > For species scores, read variable scores. The distance matrix (however
> > > calculated) describes how similar your individual sites (read samples)
> > > are to one another. You can also display information about the variables
> > > used to determine those distances/similarities, and this is what is
> > > meant by species scores. Whatever you used to generate the distance
> > > matrix, the columns represent the info used to generate the "species
> > > scores".
> > > 
> > > If some of this still isn't clear, email the list with the commands used
> > > to generate your distance matrix in R and I'll have a go at explaining
> > > this with reference to your data/example.
> > > 
> > > > 
> > > > I would be very grateful if you could help me with this fact in any
> > > > manner. Thank you in advance for your help.
> > > > 
> > > > Regards,
> > > > Alicia
> > > 
> > > HTH
> > > 
> > > G
> > > 
> > > -- 
> > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > >  Gavin Simpson                 [t] +44 (0)20 7679 0522
> > >  ECRC & ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
> > >  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> > >  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> > >  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > 
> > > 
> > >
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list