[BioC] limma: coefficient not estimable when dataset reduced by a few samples

Fri Aug 3 16:12:55 CEST 2012

Hi Natasha,

On 8/3/2012 8:11 AM, Natasha Sahgal wrote:
> Dear List,
>
> I have array data for 42 tumorBT - tumorAT patients. Initial analysis
> (adding patient and batch covariates to the model) I managed to obtain a
> list of DE genes.
>
> However now, I need to reduce this set of 42 to 33 patients pairs and
> reanalyse the data. This time though it seems that it can't estimate the
> coefficient of one patient, would this not affect my results as NA
> coefficients are produced?
>
> What I do not understand is what has changed for this to happen now? How
> do I rectify it?

You don't give any information about the design matrix with all 42 
patients, but I wonder how you were able to fit a paired analysis with a 
batch effect in that case either. When you fit a paired analysis by 
estimating a per-patient block effect, you by definition are already 
estimating the maximum number of coefficients that you can with the 
number of samples in hand. As an example:

 > treat <- factor(rep(1:2, 42))
 > batch <- factor(rep(1:2, each = 42))
 > patient <- factor(rep(1:42, each= 2))
 > is.fullrank(model.matrix(~treat+patient))
[1] TRUE
 > is.fullrank(model.matrix(~treat+batch+patient))
[1] FALSE

So I don't see how you will be able to fit a paired analysis by blocking 
on patient if you have a batch effect as well, regardless of the number 
of patients.

However, you can always do a conventional paired analysis, by first 
calculating the paired differences for each patient and then fitting the 
model. Note that in this case the coefficient you are estimating is the 
intra-pair difference, and you are testing that this difference is 
different from zero (or in other words, the contrast is implicit in the 
coefficient, so you don't need contrasts.fit()).

Best,

Jim

>
> Code:
> dim(tum.dat.red)
> 27555 66
>
> ## Overall Sample Descriptions
>> batch = as.factor(t.info.red$Batch)
> batch
>   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 1 2 2
> [39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> Levels: 1 2
>
>
>> treat = droplevels(t.info.red$Status_Treatment)
>> treat = as.factor(treat, levels=c("Tumour_BT", "Tumour_AT"))
> treat
> [1] Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT
>   [8] Tumour_AT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT
> [15] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT
> [22] Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT Tumour_AT
> [29] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT
> [36] Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT
> [43] Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT Tumour_AT
> [50] Tumour_AT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_BT Tumour_BT
> [57] Tumour_BT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT
> [64] Tumour_AT Tumour_AT Tumour_AT
> Levels: Tumour_BT Tumour_AT
>
>
>> patient = as.factor(t.info.red$Trial.number)
> patient
>   [1] 13 25 13 25 18 28 18 28 30 39 30 53 39 53 32 42 32 51 42 51 34 45 34
> 55 45
> [26] 55 54 54 37 57 37 57 35 52 35 52 17 7  12 10 14 11 16 15 14 16 15 17
> 7  12
> [51] 10 11 20 26 36 40 48 23 24 23 20 24 26 36 40 48
> 33 Levels: 7 10 11 12 13 14 15 16 17 18 20 23 24 25 26 28 30 32 34 35 36
> ... 57
>
>
> ## Deisgning model with BATCH effect
>> design.tum = model.matrix(~treat+batch+patient)
> ## Fit model&  make contrasts
>> fit.tum<- lmFit(tum.dat.red, design.tum)
> Coefficients not estimable: patient57
> Warning message:
> Partial NA coefficients for 27555 probe(s)
>
>
> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>   [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] limma_3.12.0
>
> loaded via a namespace (and not attached):
> [1] tools_2.15.0
>
>
>
> Many Thanks in advance,
> Natasha
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099