[BioC] ComBat_ Error in solve.default(t(design) %*% design): Lapack routine dgesv: system is exactly singular: U[4, 4] = 0

Guan Guan.Wang at glasgow.ac.uk
Fri Apr 25 18:03:55 CEST 2014


Johnson, William Evan <wej at ...> writes:

> 
> ComBat should be done after normalization, and only of there are clear 
signs of batch effects after
> normalization (either through significance testing, clustering, or 
principle component analysis).
> 
> On Aug 21, 2013, at 12:33 AM, amit kumar subudhi wrote:
> 
> Hello Dr. Evan,
> 
> One more doubt, hopefully you will answer it. Is it recommended that 
before doing ComBat, required
> normalization on the data should be carried out or after ComBat we can do 
the normalization step? This
> particular question making me confused. Please answer to this question if 
you can.
> 
> With best regards
> Amit
> 
> On Mon, Aug 19, 2013 at 7:12 PM, amit kumar subudhi
> <amit4help at ...<mailto:amit4help at ...>> wrote:
> This reply solved my problem. Thanks again Dr. Evan for your kind and 
prompt reply and suggestions.
> 
> Regards
> Amit
> 
> On Mon, Aug 19, 2013 at 7:08 PM, Johnson, William Evan
> <wej at ...<mailto:wej at ...>> wrote:
> Yes, it should be fine to remove batch effects on the larger dataset and 
then use a smaller subset to do your
> comparisons. In fact, this approach might even be preferred even if it 
were possible to adjust for batch in
> the smaller subset.
> 
> On Aug 19, 2013, at 9:34 AM, amit kumar subudhi wrote:
> 
> Thanks again for the reply Dr. Evans,
> 
> This set of samples is a subset from a larger set and contain many more 
samples in each batch. When I have
> performed the ComBat on the larger dataset I could able remove the batch 
effects to some extend. To Inform
> you, the known batch effect here is the different dates of hybridization 
and a simple hierarchical
> analysis showed that most of the samples are clustering based on the date 
of hybridization and hence tried
> the ComBat to remove the batch effects. The third batch contains most of 
the uncomplicated malaria
> samples. The subset of samples that I have posted here contains specific 
symptoms pertaining to severe
> malaria and hence selected for comparison with uncomplicated malaria 
samples.
> 
> Question- As I have mentioned above, I have applied the ComBat to remove 
the batch effects from the larger
> data set, can I take the smaller set of samples from the larger data set 
to find out deferentially regulated
> genes? Answer to this question would really be helpful.
> 
> With best regards
> Amit
> 
> On Mon, Aug 19, 2013 at 6:31 PM, Johnson, William Evan
> <wej at ...<mailto:wej at ...>> wrote:
> Okay, yes this is clear now. Your batch and covariate status are 
completely confounded. In other words, if
> you see a difference between "severe" and "uncomplicated" you won't know 
if this is really due to a
> covariate effect or if this is due to a batch (batch 3) effect. In short, 
this is really an experimental
> design issue and ComBat cannot help you.
> 
> If you were to remove the "malaria" covariate, then ComBat would work, but 
it would also take out all malaria
> covariate effects as well. How bad are the batch effects between batches 1 
and 2? Do you expect batch 3 to
> have a similar level of batch differences? You could combine batches 1 and 
2, and then look for differences
> with batch 3--but you wouldn't know whether the differential expression is 
due to the treatment or due to
> batch--hence the confounding...
> 
> Sorry I couldn't be much more of a help, but like I said, the issue here 
is due to experimental design.
> 
> Evan
> 
> On Aug 19, 2013, at 8:55 AM, amit kumar subudhi wrote:
> 
> Hello Dr. Evan,
> 
> Thanks for the prompt reply. Below is the whole pheno table. Looking at 
the whole table might give you an idea
> about the probable cause of the error. Batch 1 and 2 contains only severe 
malaria samples where as batch 2
> contains uncomplicated malaria samples.
> sample batch malaria
> AL 1 1 Severe
> AO 2 1 Severe
> AQ 3 1 Severe
> AP 4 1 Severe
> CF 5 2 Severe
> CL 6 2 Severe
> CU 7 2 Severe
> CV 8 2 Severe
> GA_UC 9 3 uncomplicated
> GB_UC 10 3 uncomplicated
> GC_UC 11 3 uncomplicated
> GE_UC 12 3 uncomplicated
> GR_UC 13 3 uncomplicated
> 
> With best regards
> 
> On Mon, Aug 19, 2013 at 5:50 PM, Johnson, William Evan
> <wej at ...<mailto:wej at ...>> wrote:
> Amit,
> 
> The "singularity" error you are getting occurs when your covariates are 
confounded with batch (or with
> each other). In the example you are trying is there a batch that contains 
only one covariate level and is
> that covariate level exclusive to the batch? If this does not make sense, 
post your 'pheno' variable in a
> reply and I will be happy to help you figure out the problem.
> 
> Evan
> 
> On Aug 19, 2013, at 6:00 AM, <bioconductor-request at ...
<mailto:bioconductor-request at ...>>
> 
> <bioconductor-request at ...<mailto:bioconductor-request at ...>> wrote:
> 
> > Date: Sun, 18 Aug 2013 19:58:35 +0530
> > From: amit kumar subudhi <amit4help at ...<mailto:amit4help at ...>>
> > To: bioconductor at ...<mailto:bioconductor at ...>
> > Subject: [BioC] ComBat_ Error in solve.default(t(design) %*% design) :
> >       Lapack routine dgesv: system is exactly singular: U[4, 4] = 0
> > Message-ID:
> >       <CADxjrxWKyC3prOvL3RnmYc03qPyvh_VdVxvzymu-WkVmW+nKiw at ...
<mailto:CADxjrxWKyC3prOvL3RnmYc03qPyvh_VdVxvzymu-WkVmW%2BnKiw at ...>>
> > Content-Type: text/plain
> >
> > Hello to all ComBat users,
> >
> > I am trying to remove the batch effects from some of my microarray data 
but
> > at last I am getting an error message which read as
> >
> > Found 3 batches
> > Found 1  categorical covariate(s)
> > Standardizing Data across genes
> > Error in solve.default(t(design) %*% design) :
> >  Lapack routine dgesv: system is exactly singular: U[4,4] = 0
> >
> > The head(edata) looks like this
> >                                 AL        AO        AP        AQ        
CF
> > GT_pfalci_specific_0000001 16.053898 16.080540 16.101114 16.046898 
16.087206
> > GT_pfalci_specific_0000002 10.051407 10.477143  8.369233 10.657850 
13.312936
> > GT_pfalci_specific_0000003  8.910620  8.683393  7.812817  8.496099 
10.920685
> > GT_pfalci_specific_0000004  6.603195  8.993232  6.476777  6.792369  
3.319346
> > GT_pfalci_specific_0000005  9.813562 11.084574  9.055613 11.568550 
12.977261
> > GT_pfalci_specific_0000006 15.989252 15.993513 15.963054 16.000675 
15.983985
> >                                  CL        CU        CV     GA_UC     
GB_UC
> > GT_pfalci_specific_0000001 16.082037 16.071299 16.090370 15.971335 
15.994304
> > GT_pfalci_specific_0000002 12.653076  9.703247  8.827624  5.697412  
8.060719
> > GT_pfalci_specific_0000003 11.470758 10.548943 10.718349  6.132614  
8.007271
> > GT_pfalci_specific_0000004  5.328515  8.398546  6.351136  3.045112  
3.891578
> > GT_pfalci_specific_0000005  8.520699 11.791610 11.535907  6.791468  
9.930246
> > GT_pfalci_specific_0000006 15.980660 15.984256 15.970124 13.353012 
13.740395
> >                               GC_UC     GE_UC     GR_UC
> > GT_pfalci_specific_0000001 15.855644 16.090246 16.086956
> > GT_pfalci_specific_0000002  9.026398  8.015609  7.814614
> > GT_pfalci_specific_0000003  5.341252  8.658231  5.788790
> > GT_pfalci_specific_0000004  4.191565  3.040515  3.517175
> > GT_pfalci_specific_0000005  5.446910 11.982848  5.477334
> > GT_pfalci_specific_0000006 11.872469 13.675290 13.117105
> >
> > GT_pfalci_specific_0000006 15.983985 15.970124
> >
> > and the head(pheno) looks like this
> >  sample batch malaria
> > AL      1     1  severe
> > AO      2     1  severe
> > AP      3     1  severe
> > AQ      4     1  severe
> > CF      5     2  severe
> > CL      6     2  severe
> >
> >
> > the commands that I have used for ComBat is
> > mod = model.matrix(~as.factor(malaria), data=pheno)
> > combat_edata = ComBat(dat=edata, batch=batch, mod=mod, numCovs=NULL,
> > par.prior=TRUE, prior.plots=FALSE)
> >
> > head(mod) looks like this
> >   (Intercept) as.factor(malaria)uncomplicated
> > AL           1                               0
> > AO           1                               0
> > AP           1                               0
> > AQ           1                               0
> > CF           1                               0
> > CL           1                               0
> >
> > Why I am getting this error meassage? Please help me out. When I am 
taking
> > the larger sample size (n=33) I could able to remove the batch effects 
but
> > a subset of those samples giving me the above problem.
> >
> >
> > --
> > Amit Kumar Subudhi
> > Research Scholar,
> > CSIR-Senior Research Fellow,
> > Molecular Parasitology and Systems Biology Lab,
> > Department of Biological Sciences ,
> > FD III, BITS, Pilani,
> > Rajasthan- 333031
> > e mail-
> > amit4help at ...<mailto:amit4help at ...>
> > amit.subudhi at ...<mailto:amit.subudhi at ...>
> > Mob No- 919983525845
> 
> --
> Amit Kumar Subudhi
> Research Scholar,
> CSIR-Senior Research Fellow,
> Molecular Parasitology and Systems Biology Lab,
> Department of Biological Sciences ,
> FD III, BITS, Pilani,
> Rajasthan- 333031
> e mail-
> amit4help at ...<mailto:amit4help at ...>
> amit.subudhi at ...<mailto:amit.subudhi at ...>
> Mob No- 919983525845
> 
> --
> Amit Kumar Subudhi
> Research Scholar,
> CSIR-Senior Research Fellow,
> Molecular Parasitology and Systems Biology Lab,
> Department of Biological Sciences ,
> FD III, BITS, Pilani,
> Rajasthan- 333031
> e mail-
> amit4help at ...<mailto:amit4help at ...>
> amit.subudhi at ...<mailto:amit.subudhi at ...>
> Mob No- 919983525845
> 
> --
> Amit Kumar Subudhi
> Research Scholar,
> CSIR-Senior Research Fellow,
> Molecular Parasitology and Systems Biology Lab,
> Department of Biological Sciences ,
> FD III, BITS, Pilani,
> Rajasthan- 333031
> e mail-
> amit4help at ...<mailto:amit4help at ...>
> amit.subudhi at ...<mailto:amit.subudhi at ...>
> Mob No- 919983525845
> 
> --
> Amit Kumar Subudhi
> Research Scholar,
> CSIR-Senior Research Fellow,
> Molecular Parasitology and Systems Biology Lab,
> Department of Biological Sciences ,
> FD III, BITS, Pilani,
> Rajasthan- 333031
> e mail-
> amit4help at ...<mailto:amit4help at ...>
> amit.subudhi at ...<mailto:amit.subudhi at ...>
> Mob No- 919983525845
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
Hi Evan and Amit, or others who may help,

I had the same ComBat_Error appeared when running surrogate variable 
analysis (SVA). I understood from the post that this error is to do with the 
confounded batch and covariate status. I have several other related 
questions. Hope you could have a look. Many thanks for any 
opinions/suggestions.

Data set: 24 samples from 6 subjects (4 time points/subject: 2 baseline 
samples collected on different days, 1 during drug treatment, 1 after drug 
treatment). Experiments were done with Affymetrix GeneChip 3.0 for miRNA 
expression profiling. 

Initial data analysis: "oligo" is used to handle Affy CEL files, "rma()" is 
used for data normalization. After this, I still see PC1 seems to correlate 
with certain batch effect (which I'm not aware, i.e. not come from different 
scan dates) on the PCA plot. Then "sva" package is used to estimate the 
surrogate variables, followed by "ComBat()". 

Now, come to the ComBat_Error, when I specified the contrasts as (Base2-
Base1, During-Base1, Post-Base1). The pheno input attached below:

	                        sample	batch	Status
GW2miRNA1_(miRNA-3_0).CEL	1	1	Base1
GW2miRNA2_(miRNA-3_0).CEL	1	1	Post7
GW2miRNA3_(miRNA-3_0).CEL	2	1	Base1
GW2miRNA4_(miRNA-3_0).CEL	2	1	Post7
GW2miRNA5_(miRNA-3_0).CEL	3	1	Base1
GW2miRNA6_(miRNA-3_0).CEL	3	1	Post7
GW2miRNA7_(miRNA-3_0).CEL	4	1	Base1
GW2miRNA8_(miRNA-3_0).CEL	4	1	Post7
GW2miRNA9_(miRNA-3_0).CEL	5	1	Base1
GW2miRNA10_(miRNA-3_0).CEL	5	1	Post7
GW2miRNA11_(miRNA-3_0).CEL	6	1	Base1
GW2miRNA12_(miRNA-3_0).CEL	6	1	Post7
GW1miRNA13_(miRNA-3_0).CEL	6	2	Base2
GW1miRNA14_(miRNA-3_0).CEL	6	2	During4
GW1miRNA15_(miRNA-3_0).CEL	4	2	Base2
GW1miRNA16_(miRNA-3_0).CEL	1	2	During4
GW1miRNA17_(miRNA-3_0).CEL	5	2	Base2
GW1miRNA18_(miRNA-3_0).CEL	5	2	During4
GW1miRNA19_(miRNA-3_0).CEL	4	2	During4
GW1miRNA20_(miRNA-3_0).CEL	3	2	Base2
GW1miRNA21_(miRNA-3_0).CEL	3	2	During4
GW1miRNA22_(miRNA-3_0).CEL	1	2	Base2
GW1miRNA23_(miRNA-3_0).CEL	2	3	During4
GW1miRNA24_(miRNA-3_0).CEL	2	3	Base2

I could understand from the post below that the reason is that the batch is 
confounded with the status as you could see in the phenotype file. Since the 
two baseline samples are from same subjects, however, collected on different 
days before injecting the drug. I'm thinking whether it makes sense to 
classify "Base1 + Base2" as "Base", and make contrasts for "During - Base" 
and "Post - Base". Other columns in above pheno file will be kept the same 
and re-run the "sva"? Or is it more appropriate to do two separate "sva" 
analyses, i.e. "Post7 - Base1" for first 12 samples as hybridized and 
scanned at the same time and "During4 - Base2" for the last 12 samples as 
they were treated as a batch (however, scanned at two times, that's why they 
were labelled as batch 2 and 3 in column of "batch").
 
Hope I've described clearly. Much appreciated suggestions/opinions.

Regards
Guan



More information about the Bioconductor mailing list