[BioC] limma warning: Coefficients not estimable

Gordon K Smyth smyth at wehi.EDU.AU
Sat Feb 13 04:07:40 CET 2010


Dear Karl,

The block argument of duplicateCorrelation() gives you a way to allow for 
correlation between repeat arrays on the same animal.  It is a direct 
extension of the usual linear model.  You found the correlation to be 
about 0.24, which is typical for data in which each block is an 
independent organism.

Yes, I wouldn't have suggested it if I didn't think it was worth trying. 
Indeed, if you want to examine Tissue and Pperiod effects in the same 
analysis, you must do it like this.

Best wishes
Gordon

On Fri, 12 Feb 2010, Karl Brand wrote:

> Gordon,
>
> Many thanks for your follow up to James' help*.
>
> My limited understanding of duplicateCorrelation lead me to believe that it's 
> purpose was to specify in a design, measurements/samples expected to be 
> similar, if not highly similar to each other, as is the case with technical 
> and biological replicates, and attribute the differences between such 
> specified replicates being due to technical/biological 'noise' via a modeling 
> system different to the linear ones used for DE gene identification. Do i 
> understand correctly?
>
> In our experiment, although Tissue "R" & "C" are expected to be more similar 
> than different on the whole genome level;
>
> BTW-
>> dupfit$consensus
> [1] 0.237422
>
> our interest is in the *differences* as is probably obvious from the model.
>
> So im just trying to get my biologists understanding c.clear that applying 
> duplicateCorrelation to the Animal blocking as you suggested, is atleast a 
> *worthwhile thing to try*. I guess you wouldn't provide the example if you 
> didn't think so, but right now is a good time for emphatic clarity.
>
> Of course i did try, and the results are logical to me and thus worth using. 
> In short, the no. of DE genes for the contrasts between factor=Pperiod 
> changed little (<5%), whilst the contrasts between factor=Tissue for "R" & 
> "C" changed alot- DE genes increased by ~20% by inclusion of 
> duplicateCorrelation.
>
> Again, sincere cheers,
>
> Karl
>
>
> on 2/12/2010 12:09 AM Gordon K Smyth said the following:
>> Dear Karl,
>> 
>> You are trying to do the impossible.
>> 
>> The essential thing to appreciate is that your experiment has two levels of 
>> variability, within and between animals.
>> 
>> You have two samples from each animal, tissues R and C.  Therefore tissue 
>> is your within-animal factor.  Your other two factors, Rperiod and Time are 
>> between-animal factors.
>> 
>> If you include Animal as a term in your design matrix formula, then you are 
>> comparing treatments within animal, and you can only test within-animal 
>> hypotheses.  Therefore you cannot include Pperiod or Time in your model.
>> 
>> You have two choices.  One is to do all your analysis within animal. 
>> Therefore you fit the model (~Tissue + Animal) and test only the tissue 
>> effect.  This fully adjusts for any Period or Time effects, but does not 
>> allow you to test for them.
>> 
>> If you wish to recover information about Period and Time from the 
>> between-animal error strata, then you need to treat Animal as a random 
>> effect.  This you do by:
>>
>>    design <- model.matrix(~Tissue * Pperiod + Time)
>>    dupfit <- duplicateCorrelation(rma.pp, design, ndups=1, block=Animal)
>>    fit <- lmFit(rma.pp, design, correlation=dupfit$consensus,
>>           block=Animal)
>> 
>> remembering to check that dupfit$consensus is positive.
>> 
>> James has given you good advice about consulting a good biostatistician. 
>> (There are plenty of questions that could be asked.  Do you really need to 
>> adjust for a Time variable with 16 levels etc.)
>> 
>> Best wishes
>> Gordon
>> 
>>> Date: Wed, 10 Feb 2010 22:37:27 +0100
>>> From: Karl Brand <k.brand at erasmusmc.nl>
>>> To: "James W. MacDonald" <jmacdon at med.umich.edu>
>>> Cc: bioconductor at stat.math.ethz.ch
>>> Subject: Re: [BioC] limma warning: Coefficients not estimable
>>> Message-ID: <4B732717.5080708 at erasmusmc.nl>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>> 
>>> Cheers Jim,
>>> 
>>> On 2/10/2010 5:57 PM, James W. MacDonald wrote:
>>>> Hi Karl,
>>>> 
>>>> Karl Brand wrote:
>>>>> Dear BioC,
>>>>> 
>>>>> Using limma, when fitting the model:
>>>>> model.matrix(~Tissue * Pperiod + Time + Animal)
>>>>> 
>>>>> I get this warning:
>>>>>> fit <- lmFit(rma.pp, design)
>>>>> Coefficients not estimable: Animal32 Animal33 Animal34 Animal35
>>>>> Animal36 Animal37 Animal38 Animal39 Animal40 Animal41 Animal42
>>>>> Animal43 Animal44 Animal45 Animal46 Animal47 Animal48
>>>>> Warning message:
>>>>> Partial NA coefficients for 45101 probe(s)
>>>>> 
>>>>> In addition, the reuslting number or DE genes for my contrasts of
>>>>> interest (which are different than the 'not estimable' ones listed in
>>>>> teh warning above) are mcuh lower than expected; & furthermore, the
>>>>> contrast-coefficents (log2FCs) and simply wrong.
>>>>> 
>>>>> When fitting a similar model, merely lacking the 'pairing' factor
>>>>> ("Animal"):
>>>>> model.matrix(~Tissue * Pperiod + Time)
>>>>> 
>>>>> I don't get this error. My question:
>>>>> 
>>>>> Is it me? Or am i attempting the impossible, ie., by including a
>>>>> factor for pairing (Animal) trying to fit more factors than my
>>>>> measurements can support and this is limma's way of telling me? Raw
>>>>> script and targets file below.
>>>> 
>>>> You may be attempting the impossible, or you may just be doing something
>>>> incorrectly. You are certainly trying to estimate more parameters than
>>>> you have data with which to do so.
>>> 
>>> Right. This helps me alot, some confirmation of what i can and cant
>>> achieve with my data.
>>>> 
>>>> It looks like you have a fairly complex experimental design, so I would
>>>> recommend finding a local statistician who can help you with the 
>>>> analysis.
>>>> 
>>>>> 
>>>>> I really hope an experienced limma user can enlighten me on this, or
>>>>> point me to a resource suitable for a biologists level of understanding.
>>>> 
>>>> Pretty much any basic linear modeling textbook would be helpful.
>>>> However, it looks like you might have a timecourse experiment with
>>>> perhaps repeated measures, which may require a non-trivial analysis
>>>> method. As a Biologist, you might have jumped into the deep end of the
>>>> pool, so finding somebody local to help is not a bad idea.
>>> 
>>> Unfortunately learning to swim has been faster....along with your (and
>>> several other non-local statisticians) 'flotation aids'.
>>> 
>>> sincere thanks for your thoughts,
>>> 
>>> Karl
>>> 
>>>> 
>>>> Best,
>>>> 
>>>> Jim
>>>> 
>>>> 
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> Karl
>>>>> 
>>>>> 
>>>>>> targets <- read.delim("RNA_Targets.txt")
>>>>> 
>>>>>> Tissue <- factor(targets$Tissue, levels = c("R", "C"))
>>>>> 
>>>>>> Pperiod <- factor(targets$Pperiod, levels = c("E", "L", "S"))
>>>>> 
>>>>>> Time <- factor(targets$Time, levels = c("1", "2", "3", "4",
>>>>> + "5", "6", "7", "8",
>>>>> + .... [TRUNCATED]
>>>>> 
>>>>>> Animal <- factor(targets$Animal, levels = c("1", "2", "3", "4",
>>>>> + "5", "6", "7", "8",
>>>>> + .... [TRUNCATED]
>>>>> 
>>>>>> design <- model.matrix(~Tissue * Pperiod + Time + Animal)
>>>>> 
>>>>>> colnames(design)
>>>>> [1] "(Intercept)" "TissueC" "PperiodL" "PperiodS" "Time2" "Time3"
>>>>> "Time4" "Time5" "Time6" "Time7"
>>>>> [11] "Time8" "Time9" "Time10" "Time11" "Time12" "Time13" "Time14"
>>>>> "Time15" "Time16" "Animal2"
>>>>> [21] "Animal3" "Animal4" "Animal5" "Animal6" "Animal7" "Animal8"
>>>>> "Animal9" "Animal10" "Animal11" "Animal12"
>>>>> [31] "Animal13" "Animal14" "Animal15" "Animal16" "Animal17" "Animal18"
>>>>> "Animal19" "Animal20" "Animal21" "Animal22"
>>>>> [41] "Animal23" "Animal24" "Animal25" "Animal26" "Animal27" "Animal28"
>>>>> "Animal29" "Animal30" "Animal31" "Animal32"
>>>>> [51] "Animal33" "Animal34" "Animal35" "Animal36" "Animal37" "Animal38"
>>>>> "Animal39" "Animal40" "Animal41" "Animal42"
>>>>> [61] "Animal43" "Animal44" "Animal45" "Animal46" "Animal47" "Animal48"
>>>>> "TissueC:PperiodL" "TissueC:PperiodS"
>>>>>> source(.trPaths[5], echo=TRUE, max.deparse.length=150)
>>>>> 
>>>>>> fit <- lmFit(rma.pp, design)
>>>>> Coefficients not estimable: Animal32 Animal33 Animal34 Animal35
>>>>> Animal36 Animal37 Animal38 Animal39 Animal40 Animal41 Animal42
>>>>> Animal43 Animal44 Animal45 Animal46 Animal47 Animal48
>>>>> Warning message:
>>>>> Partial NA coefficients for 45101 probe(s)
>>>>>> 
>>>>> 
>>>>> 
>>>>> FileName Tissue Pperiod Time Animal
>>>>> 01-PPL3-sample02.CEL R S 1 1
>>>>> 02-PPL3-sample03.CEL C S 1 1
>>>>> 03-PPL5-sample02.CEL R S 2 2
>>>>> 04-PPL5-sample03.CEL C S 2 2
>>>>> 05-PPL3-sample04.CEL R S 3 3
>>>>> 06-PPL3-sample05.CEL C S 3 3
>>>>> 07-PPL5-sample04.CEL R S 4 4
>>>>> 08-PPL5-sample05.CEL C S 4 4
>>>>> 09-PPL3-sample06.CEL R S 5 5
>>>>> 10-PPL3-sample07.CEL C S 5 5
>>>>> 11-PPL5-sample06.CEL R S 6 6
>>>>> 12-PPL5-sample07.CEL C S 6 6
>>>>> 13-PPL3-sample08.CEL R S 7 7
>>>>> 14-PPL3-sample09.CEL C S 7 7
>>>>> 15-PPL5-sample08.CEL R S 8 8
>>>>> 16-PPL5-sample09.CEL C S 8 8
>>>>> 17-PPL3-sample10.CEL R S 9 9
>>>>> 18-PPL3-sample11.CEL C S 9 9
>>>>> 19-PPL5-sample10.CEL R S 10 10
>>>>> 20-PPL5-sample11.CEL C S 10 10
>>>>> 21-PPL3-sample12.CEL R S 11 11
>>>>> 22-PPL3-sample13.CEL C S 11 11
>>>>> 23-PPL5-sample12.CEL R S 12 12
>>>>> 24-PPL5-sample13.CEL C S 12 12
>>>>> 25-PPL3-sample14.CEL R S 13 13
>>>>> 26-PPL3-sample15.CEL C S 13 13
>>>>> 27-PPL5-sample14.CEL R S 14 14
>>>>> 28-PPL5-sample15.CEL C S 14 14
>>>>> 29-PPL3-sample16.CEL R S 15 15
>>>>> 30-PPL3-sample17.CEL C S 15 15
>>>>> 31-PPL5-sample16.CEL R S 16 16
>>>>> 32-PPL5-sample17.CEL C S 16 16
>>>>> 33-PPL1-sample02.CEL R E 1 17
>>>>> 34-PPL1-sample03.CEL C E 1 17
>>>>> 35-PPL6-sample02.CEL R E 2 18
>>>>> 36-PPL6-sample03.CEL C E 2 18
>>>>> 37-PPL1-sample04.CEL R E 3 19
>>>>> 38-PPL1-sample05.CEL C E 3 19
>>>>> 39-PPL6-sample04.CEL R E 4 20
>>>>> 40-PPL6-sample05.CEL C E 4 20
>>>>> 41-PPL1-sample06.CEL R E 5 21
>>>>> 42-PPL1-sample07.CEL C E 5 21
>>>>> 43-PPL6-sample06.CEL R E 6 22
>>>>> 44-PPL6-sample07.CEL C E 6 22
>>>>> 45-PPL1-sample08.CEL R E 7 23
>>>>> 46-PPL1-sample09.CEL C E 7 23
>>>>> 47-PPL6-sample08.CEL R E 8 24
>>>>> 48-PPL6-sample09.CEL C E 8 24
>>>>> 49-PPL1-sample10.CEL R E 9 25
>>>>> 50-PPL1-sample11.CEL C E 9 25
>>>>> 51-PPL6-sample10.CEL R E 10 26
>>>>> 52-PPL6-sample11.CEL C E 10 26
>>>>> 53-PPL1-sample12.CEL R E 11 27
>>>>> 54-PPL1-sample13.CEL C E 11 27
>>>>> 55-PPL6-sample12.CEL R E 12 28
>>>>> 56-PPL6-sample13.CEL C E 12 28
>>>>> 57-PPL1-sample14.CEL R E 13 29
>>>>> 58-PPL1-sample15.CEL C E 13 29
>>>>> 59-PPL6-sample14.CEL R E 14 30
>>>>> 60-PPL6-sample15.CEL C E 14 30
>>>>> 61-PPL1-sample16.CEL R E 15 31
>>>>> 62-PPL1-sample17.CEL C E 15 31
>>>>> 63-PPL6-sample16.CEL R E 16 32
>>>>> 64-PPL6-sample17.CEL C E 16 32
>>>>> 65-PPL2-sample02.CEL R L 1 33
>>>>> 66-PPL2-sample03.CEL C L 1 33
>>>>> 67-PPL4-sample02.CEL R L 2 34
>>>>> 68-PPL4-sample03.CEL C L 2 34
>>>>> 69-PPL2-sample04.CEL R L 3 35
>>>>> 70-PPL2-sample05.CEL C L 3 35
>>>>> 71-PPL4-sample04.CEL R L 4 36
>>>>> 72-PPL4-sample05.CEL C L 4 36
>>>>> 73-PPL2-sample06.CEL R L 5 37
>>>>> 74-PPL2-sample07.CEL C L 5 37
>>>>> 75-PPL4-sample06.CEL R L 6 38
>>>>> 76-PPL4-sample07.CEL C L 6 38
>>>>> 77-PPL2-sample08.CEL R L 7 39
>>>>> 78-PPL2-sample09.CEL C L 7 39
>>>>> 79-PPL4-sample08.CEL R L 8 40
>>>>> 80-PPL4-sample09.CEL C L 8 40
>>>>> 81-PPL2-sample10.CEL R L 9 41
>>>>> 82-PPL2-sample11.CEL C L 9 41
>>>>> 83-PPL4-sample10.CEL R L 10 42
>>>>> 84-PPL4-sample11.CEL C L 10 42
>>>>> 85-PPL2-sample12.CEL R L 11 43
>>>>> 86-PPL2-sample13.CEL C L 11 43
>>>>> 87-PPL4-sample12.CEL R L 12 44
>>>>> 88-PPL4-sample13.CEL C L 12 44
>>>>> 89-PPL2-sample14.CEL R L 13 45
>>>>> 90-PPL2-sample15.CEL C L 13 45
>>>>> 91-PPL4-sample14.CEL R L 14 46
>>>>> 92-PPL4-sample15.CEL C L 14 46
>>>>> 93-PPL2-sample16.CEL R L 15 47
>>>>> 94-PPL2-sample17.CEL C L 15 47
>>>>> 95-PPL4-sample16.CEL R L 16 48
>>>>> 96-PPL4-sample17.CEL C L 16 48
>>>>> 
>>>> **********************************************************
>>>> Electronic Mail is not secure, may not be read every day, and should not
>>>> be used for urgent or sensitive issues
>>> 
>>> -- 
>>> Karl Brand <k.brand-asperand-erasmusmc.nl>
>>> Department of Genetics
>>> Erasmus MC
>>> Dr Molewaterplein 50
>>> 3015 GE Rotterdam
>>> lab +31 (0)10 704 3409 fax +31 (0)10 704 4743 mob +31 (0)642 777 268
>> 
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the 
>> addressee.
>> You must not disclose, forward, print or use it without the permission of 
>> the sender.
>> ______________________________________________________________________
>
> -- 
> Karl Brand <k.brand-asperand-erasmusmc.nl>
> Department of Genetics
> Erasmus MC
> Dr Molewaterplein 50
> 3015 GE Rotterdam
> lab +31 (0)10 704 3409 fax +31 (0)10 704 4743 mob +31 (0)642 777 268
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list