[BioC] ComBat: Working with no replicates
pkpekka at gmail.com
Tue Oct 29 21:39:57 CET 2013
In my previous lab we were also doing cell line profiling in this
manner. But similar considerations apply to Combat as to statistical
testing in general (e.g. for differential expression). Combat is
estimating gene-specific means and variances, but uses the empirical
bayesian pooled variance (shrinkage) method. That is why in contrast
to other methods Combat works with less than 10 samples: doi:
10.1093/biostatistics/kxj037. Based on my understanding you cannot use
Combat to correct the lab-specific effects with only one sample per
lab, but cell line should be a covariate so that cell line effects are
not "normalized away". And there should really be more than 2 samples
per lab as well, preferrably.
2013/10/29 Pedro Furió <pfurio at cipf.es>:
> Pekka Kohonen <pkpekka at ...> writes:
>> Dear Pedro,
>> If you have just one sample from the lab, how do you differentiate
>> between the cell line-specific effect and the lab-specific effect? I
>> don't see how you are trying to do with these 3 samples makes any
>> sense. If you have the same cell lines measured in a different lab
>> (which has enough samples to run ComBat) why not just use those then?
>> Also, I wonder what is the minimum number of samples to estimate a
>> lab-specific distribution (which is what Combat is doing) for each
>> gene? Probably 5-10 samples or so?
>> I think that statistics should not be treated as just a way to hack
>> your data so that it appears to be OK. This sounds a bit like doing
>> Best, Pekka
>> P.S. my name in Finnish means "Pedro"
>> 2013/10/28 Pedro Furió Tarí <pfurio at ...>:
>> > Dear all,
>> > We have a mix of cell-lines run by 12 different labs (more than 150 samples
>> > in total) and we have found a strong batch effect by laboratory that we
>> > would like to correct. From those 12, there are 3 labs that are bringing
>> > just one cell-line with no replicates at all (1 sample).
>> > If we remove the samples from those 3 labs, we are able to run ComBat, but
>> > we would like to keep them if possible. Is there any way? If we simulate a
>> > "false replicate" just by copying the same expression values it works.
>> > Could it be the way to go? Could these results be trustworthy?
>> > We also would like to use the different cell-line names as the covariates,
>> > but some of them don't have any replicates, so it doesn't work. Is there
>> > any way we could also use them as categorical covariates? Right now we are
>> > not giving any covariates information.
>> > Any help would be much appreciated :)
>> > Thanks in advance,
>> > Pedro
>> > [[alternative HTML version deleted]]
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at ...
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> Bioconductor mailing list
>> Bioconductor at ...
>> Search the archives:
> Dear Pekka,
> Maybe we did not explain well the problem. We do not want to perform any
> statistical test on the data after correcting the batch effect, so we do not
> need to have replicates in all the cell-lines. We would like to perform
> another kind of analysis for which we need to correct the batch effect. It
> happens that we have this strong "lab effect" we would like to remove but
> unfortunately some of the labs only produced 1 sample and it makes ComBat
> return an error. Perhaps it is not possible to apply ComBat in these
> situations but we wanted to be sure before using another strategy.
> Thanks so much for your kind response.
> Best regards,
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor