[BioC] 4 main questions about making complicated model matrix and fitting the model

Gordon K Smyth smyth at wehi.EDU.AU
Sun Dec 12 04:40:37 CET 2010


Dear Mingkwan,

I'll make some short responses below.  I think though that your design is 
getting too complex to give help by email like this.  This is the sort of 
data problem for which it is usually beneficial to establish a 
collaborative relationship with a statistical bioinformatics specialist, 
at your own institution or nearby, rather than trying to do it all 
yourself, unless you are a statistician yourself.

> Date: Tue, 7 Dec 2010 19:05:45 +0100
> From: Mingkwan Nipitwattanaphon <Mingkwan.Nipitwattanaphon at unil.ch>
> To: bioconductor at r-project.org
> Subject: [BioC] 4 main questions about making complicated model matrix
> 	and	fitting the model
> Message-ID: <F3479687-F88D-41F2-B00E-E18295C5FD7B at unil.ch>
> Content-Type: text/plain
>
> Dear BioC users,
>
> I am using  2 color-spotted microarrays. My samples are queen ants
> with different genotypes (BB=D, Bb=H, bb=R), social forms (Monogyne,
> Polygyne) and ages (young virgin queen=2d, mature virgin queen=11d,
> and mated/reproductive/ mother queen=mom).  They are hybridized to the
> reference RNA.
>
> I would like to analyze my data by making model matrix  like
> y~ Age + Genotype *nested within* Social form + Age : Genotype *nested
> within* Social form + fixed factor (Batch)
>
> I have tried to analyze my data as show below and I got many questions.
>
> 1. Differences between different commands of model.matrix. From the
> examples below, I know that ~0 or ~1 is just whether I want the
> intercept or not but when there is an intercept, one of the groups is
> disappeared from the model (in this case, batchI or M11dD is
> disappeared depending on which factor comes first). Would this affect
> the result?  Is the order of the factors in the model important?

Yes, it does affect the result.  Yes, it is important.

You shouldn't normally be trying to remove the intercept here.  Doing so 
doesn't change the model that is fitted, so the total number of 
coefficients doesn't change, but it does change the parametrization.

To see how this works, try experimenting on your own with a oneway anova 
type data set, to get familiar with how model.matrix works.

> 2. Although I have 3 different genotypes, 2 social forms and 3 time
> points, some combinations are biologically impossible, e.g. M11dH,
> M2dH, MmomH, M11dR, M2dR, MmomR, PmomD, PmomR. When I use the command
> below, limma gives me all of the combinations but only some do exist.
> So, the ones with do not exist, have only 0 value in all rows.  These
> groups will result in NA values for all the spots after the model fit.
> Is it OK to have NA values or it is better to remove  these groups
> from the model matrix before fitting the model?

I don't know what your data looks like, but this suggests to me that the 
linear model formula is not appropriate for your data.  It will therefore 
be difficult to interpret the resulting coefficients.  You probably need a 
different approach.

> 3. Problem with model fit
> After fitting the model. I got NA values for all spots of all the
> groups that do not exist. This could be  normal, but the group "P11dR"
> do exist. Why did I also get NAs from this group?
>
> isGene <- MA$genes$Status == "cDNA"
> > fitDesign1 <- lmFit(MA[isGene,], design1)
> Coefficients not estimable: M11dH M2dH MmomH M11dR M2dR MmomR PmomD
> P11dR P2dR PmomR

Your linear model contains more terms than can be estimated from your 
data.  You probably need a different approach.

Best to show your data to a statistician, who can work with you, to 
consider how your data should be analysed.

> 4. After fitting the model, do I need to make contrasts between 
> interesting comparisons?  Although all slides are compared to reference, 
> with this complicated linear model, I am not sure that if I do not make 
> contrast, the limma will give me the coefficient of each group compared 
> to reference or something else.

The fact that not everything can be estimated makes the interpretation of 
your model difficult.

Best wishes
Gordon

> I truly appreciate all the answers in advance!
>
> best regards,
>
> Mingkwan Nipitwattanaphon

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list