[BioC] limma: coefficient not estimable when dataset reduced by a few samples

Natasha Sahgal nsahgal at well.ox.ac.uk
Mon Aug 6 11:50:43 CEST 2012


Dear Prof. Gordon,

Thank you for your response. I think I know now based on what you said below what went wrong.

In the first instance where I have 42 patient pairs, there are a couple of tumour subtypes which are all taken into consideration. Here, as you mentioned I have more than one patient appearing in both batches. Hence I think ~treat+patient+batch worked and did not throw any errors.

>head(tumour)
   Trial.number Tumour.type Status_Treatment Batch
1            50         SCC        Tumour_BT     1
2            13       adeno        Tumour_AT     1
5            50         SCC        Tumour_AT     1
6            25       adeno        Tumour_BT     1
9            13       adeno        Tumour_BT     1
10           25       adeno        Tumour_AT     1
 
> dim(tumour)
[1] 84  4

>table(tumour$Trial.number,tumour$Batch)
    
     1 2
  7  0 2
  10 0 2
  11 0 2
  12 0 2
  13 2 0
  14 0 2
  15 0 2
  16 0 2
  17 0 2
  18 2 0
  20 0 2
  21 1 1
  23 0 2
  24 0 2
  25 2 0
  26 0 2
  27 1 1
  28 2 0
  29 1 1
  30 2 0
  31 1 1
  32 2 0
  34 2 0
  35 2 0
  36 0 2
  37 2 0
  39 2 0
  40 0 2
  41 1 1
  42 2 0
  44 1 1
  45 2 0
  46 0 2
  48 0 2
  50 2 0
  51 2 0
  52 2 0
  53 2 0
  54 2 0
  55 2 0
  56 2 0
  57 2 0


Now however, when I reduce to 33 patient pairs, I look at only one tumor subtype where none of the patients belong to different batches. Thus I see now that in this case batch is an unnecessary term in the model. Thus all I need in this case is: ~treat+patient

>head(tumour2)
   Trial.number Tumour.type Status_Treatment Batch
2            13       adeno        Tumour_AT     1
6            25       adeno        Tumour_BT     1
9            13       adeno        Tumour_BT     1
10           25       adeno        Tumour_AT     1
14           18       adeno        Tumour_AT     1
18           28       adeno        Tumour_BT     1

> dim(tumour2)
[1] 66  4

> table(tumour2$Trial.number,tumour2$Batch)
    
     1 2
  7  0 2
  10 0 2
  11 0 2
  12 0 2
  13 2 0
  14 0 2
  15 0 2
  16 0 2
  17 0 2
  18 2 0
  20 0 2
  23 0 2
  24 0 2
  25 2 0
  26 0 2
  28 2 0
  30 2 0
  32 2 0
  34 2 0
  35 2 0
  36 0 2
  37 2 0
  39 2 0
  40 0 2
  42 2 0
  45 2 0
  48 0 2
  51 2 0
  52 2 0
  53 2 0
  54 2 0
  55 2 0
  57 2 0

Also thank you for explaining what the warning actually means and that limma output has no effect on it.


Best Wishes,
Natasha


-----Original Message-----
From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] 
Sent: 05 August 2012 03:11
To: Natasha Sahgal
Cc: Bioconductor mailing list
Subject: limma: coefficient not estimable when dataset reduced by a few samples

Dear Natasha,

The results limma has given you are completely correct.

As James MacDonald has explained, the reason for the warning is that you have put too many variables into the model.  There is no need to correct for the batch effect, because you are already correcting for baseline differences between the patients, and this includes the batch effect because the patients are different in the two batches.  You will have got exactly the same warning even with 42 patients, unless one or more of the patient appeared in both batches.

When you put the unnecessary batch term into model, limma automatically adjusts the model for you, removing the unnecessary terms, and gives you exactly the same paired t-test that you would have got had you not added the batch effect.  The warning is just to let you know that an adjustment was made.

In other words, limma gives you the correct test whether or not you put in extra unnecessary between-patient terms.  The fact that NA coefficients are produced has no effect at all on your tests for the treatment effect.

Best wishes
Gordon

----------- original message ------------- [BioC] limma: coefficient not estimable when dataset reduced by a few samples Natasha Sahgal nsahgal at well.ox.ac.uk Fri Aug 3 14:11:31 CEST 2012

Dear List,

I have array data for 42 tumorBT - tumorAT patients. Initial analysis (adding patient and batch covariates to the model) I managed to obtain a list of DE genes.

However now, I need to reduce this set of 42 to 33 patients pairs and reanalyse the data. This time though it seems that it can't estimate the coefficient of one patient, would this not affect my results as NA coefficients are produced?

What I do not understand is what has changed for this to happen now? How do I rectify it?

Code:
dim(tum.dat.red)
27555 66

## Overall Sample Descriptions
>batch = as.factor(t.info.red$Batch)
batch
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 2 2
[39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Levels: 1 2


>treat = droplevels(t.info.red$Status_Treatment)
>treat = as.factor(treat, levels=c("Tumour_BT", "Tumour_AT"))
treat
[1] Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT
  [8] Tumour_AT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT [15] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT [22] Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT Tumour_AT [29] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT [36] Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT [43] Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT Tumour_AT [50] Tumour_AT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_BT Tumour_BT [57] Tumour_BT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT [64] Tumour_AT Tumour_AT Tumour_AT
Levels: Tumour_BT Tumour_AT


>patient = as.factor(t.info.red$Trial.number)
patient
  [1] 13 25 13 25 18 28 18 28 30 39 30 53 39 53 32 42 32 51 42 51 34 45 34
55 45
[26] 55 54 54 37 57 37 57 35 52 35 52 17 7  12 10 14 11 16 15 14 16 15 17
7  12
[51] 10 11 20 26 36 40 48 23 24 23 20 24 26 36 40 48
33 Levels: 7 10 11 12 13 14 15 16 17 18 20 23 24 25 26 28 30 32 34 35 36 ... 57


## Deisgning model with BATCH effect
>design.tum = model.matrix(~treat+batch+patient)

## Fit model  & make contrasts
> fit.tum <- lmFit(tum.dat.red, design.tum)
Coefficients not estimable: patient57
Warning message:
Partial NA coefficients for 27555 probe(s)


sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] limma_3.12.0

loaded via a namespace (and not attached):
[1] tools_2.15.0


Many Thanks in advance,
Natasha

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list