[BioC] ComBat

Valerie Obenchain vobencha at fhcrc.org
Tue Jan 8 00:57:56 CET 2013


Hello,

On 01/07/2013 10:39 AM, Gayatri Iyer wrote:
> Hi,
> Thank you for the reply.
> My question was when I run the example data set
> ## Not run:
>
>    ## Load data
> library(sva)
>    library(bladderbatch)
>    data(bladderdata)
>
>    ## Obtain phenotypic data
>    pheno = pData(bladderEset)
>    edata = exprs(bladderEset)
>    batch = pheno$batch
>    mod = model.matrix(~as.factor(cancer), data=pheno)
> And when I type pheno.I see Cancer,Normal and Biopsy in the cancer
> column.So I considered the number of Covariates in (cancer) were 3 not 2.

I'm not an expert on the model.matrix but I believe this is the classic 
dimension reduction representation. It takes n-1 variables to represent 
n variables. You can see this by looking at the contrasts in the 
'cancer' column.

 > contrasts(pheno$cancer)
        Cancer Normal
Biopsy      0      0
Cancer      1      0
Normal      0      1

In the model.matrix the combination of 00 (i.e., Cancer=0 and Normal=0) 
represent the Biopsy samples.

 > mod <- model.matrix(~ cancer, data=pheno)
 > tail(mod)
              (Intercept) cancerCancer cancerNormal
GSM71072.CEL           1            0            0
GSM71073.CEL           1            0            0
GSM71074.CEL           1            0            0
GSM71075.CEL           1            0            0
GSM71076.CEL           1            0            0
GSM71077.CEL           1            0            0


Valerie




> And when I run
>   combat_edata = ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE,
> prior.plots=FALSE)
> I see
> Found 5 batches
> Found 2  categorical covariate(s)
> So why is this bias.
> I am sorry for not being precise in my previous email.
> Thank you,
> Gayatri
>
>
>
> On Mon, Jan 7, 2013 at 12:22 PM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
>     Hi Gayatri,
>
>     Following the example on the ComBat man page as you have done, I see
>     2 covariates, not 3.
>
>     The model matrix shows 'cancer' and 'normal' covariates:
>
>         head(mod)
>
>                   (Intercept) as.factor(cancer)Cancer
>     as.factor(cancer)Normal
>     GSM71019.CEL           1                       0                       1
>     GSM71020.CEL           1                       0                       1
>     ...
>     ...
>
>     The intercept is not considered a categorical covariate.
>
>     I've cc'd the package authors in case they have something else to add.
>
>     Valerie
>
>
>
>     On 01/03/2013 01:31 PM, Gayatri Iyer wrote:
>
>         Hi,
>         I am trying to run my microarray data through ComBat(in sva
>         packge).Below
>         is my sampleinfo file.
>         When I run it with these commands
>
>         batch = sampleinfo$Batch
>            mod = model.matrix(~as.factor(__Covariate), data=sampleinfo)
>         combat_edata = ComBat(dat=edata, batch=batch, mod=mod,
>         par.prior=TRUE,
>         prior.plots=FALSE)
>         I dont get any error and it runs with this message.
>         Found 10 batches
>         Found 3  categorical covariate(s)
>         Standardizing Data across genes
>         Fitting L/S model and finding priors
>         Finding parametric adjustments
>         Adjusting the Data
>
>         But the problem is I have 4 covariate not three.
>
>         I even ran the example dataset in SVA package
>            ## Load data
>             library(bladderbatch)
>             data(bladderdata)
>
>             ## Obtain phenotypic data
>             pheno = pData(bladderEset)
>             edata = exprs(bladderEset)
>             batch = pheno$batch
>             mod = model.matrix(~as.factor(__cancer), data=pheno)
>
>             ## Correct for batch using ComBat
>             combat_edata = ComBat(dat=edata, batch=batch, mod=mod,
>         par.prior=TRUE,
>         prior.plots=FALSE)
>         This also give
>         Found 2  categorical covariate(s) when there are three
>         covariates in this
>         dataset.
>
>
>         Please help,
>         Thank you,
>         Gayatri
>
>            Array name Sample name Batch Covariate  C3D1 1 1 1  C3D2 2 1
>         1  C3D3 3 1 1
>         C3D5 4 2 1  C3D14 5 6 1  C3D15 6 6 1  C3D16 7 7 1  C3D17 8 7 1
>           C3D18 9 7 1
>         S3D7 10 3 2  S3D8 11 4 2  S3D9 12 4 2  S3D10 13 4 2  S3D11 14 5
>         2  S3D12 15
>         5 2  S3D19 16 8 2  S3D20 17 8 2  S3D21 18 9 2  S3D22 19 9 2
>           S3D23 20 10 2
>         S3D24 21 10 2  C25D1 22 1 3  C25D2 23 1 3  C25D3 24 2 3  C25D5
>         25 3 3
>         C25D14 26 6 3  C25D15 27 6 3  C25D16 28 7 3  C25D17 29 7 3
>           C25D18 30 8 3
>         S25D7 31 3 4  S25D8 32 4 4  S25D9 33 4 4  S25D10 34 5 4  S25D11
>         35 5 4
>         S25D12 36 5 4  S25D19 37 8 4  S25D21 38 9 4
>
>         S25D23
>         39 10
>
>         4
>
>                  [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>



More information about the Bioconductor mailing list