[BioC] Error when applying ComBat in SCAN.UPC

James W. MacDonald jmacdon at uw.edu
Mon Jun 9 18:20:11 CEST 2014


Hi Peter,

You are right. Sorry for the noise.

Best,

Jim



On 6/9/2014 12:14 PM, Peter Langfelder wrote:
> Hi Jim,
>
> with all due respect, it also helps to read the documentation (in this
> case for ComBat) which states
>
>   mod: Model matrix for outcome of interest and other covariates
>            besides batch
>
> The key point is "besides batch". I also read the source code of the
> function and it adds the batch to mod, so if Joel has no covariates of
> interest, model.matrix(~1,...) is the correct model matrix to use to
> correct for batch without any covariates.
>
> So to not confuse Joel even more, the argument mod = model.matrix(~1,
> data = data.frame(batch)) is correct and I understand pretty well what
> it does and why it is the right argument to use.
>
> Best,
>
> Peter
>
>
> On Mon, Jun 9, 2014 at 7:59 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
>> Hi Joel,
>>
>> It always helps to look at things to make sure you are doing what you
>> expect. As an example, I think you are expecting that this does something
>> different than it actually does:
>>
>>
>> mod = model.matrix(~1, data = data.frame(batch))
>>
>> This appears to be your attempt to follow the sva vignette. But note that in
>> the sva vignette, the analogous code is used to create the NULL model, not
>> the full model. And also note that the data argument in this case is only
>> useful in that it tells model.matrix() how many 1s to put in the design
>> matrix. In other words
>>
>> model.matrix(~1, data = somedataframe)
>>
>> is telling R to give you a design matrix with just an intercept, and use
>> somedataframe to determine the number of rows for the design matrix. It
>> appears you might think that this is telling R to use the first column of
>> your data.frame, which is incorrect. Instead, you have to pass the column
>> name of the data.frame that you want to use.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 6/9/2014 8:06 AM, Joel Ma wrote:
>>>
>>> Hi Natasha
>>>
>>> Thanks for pointing it out. It worked but another error message appeared.
>>> It is the same one I encountered before.
>>>
>>> Found 2 batches
>>> Found 0  categorical covariate(s)
>>> Standardizing Data across genes
>>> Fitting L/S model and finding priors
>>> Finding parametric adjustments
>>> Error in while (change > conv) { : missing value where TRUE/FALSE needed
>>>
>>> I found 4 probesets with zero values for all samples. I am guessing those
>>> are what Peter mentioned as zero variances. I will try to remove them and
>>> see what I get.
>>>
>>> Thanks again.
>>>
>>> Cheers
>>>
>>> Joel
>>> ________________________________________
>>> From: Natasha Sahgal [n.sahgal at qmul.ac.uk]
>>> Sent: Monday, June 09, 2014 9:48 PM
>>> To: Joel Ma; Peter Langfelder
>>> Cc: bioconductor at r-project.org
>>> Subject: Re: [BioC] Error when applying ComBat in SCAN.UPC
>>>
>>> Hi Joel,
>>>
>>> I think you have a typo in your code:
>>>
>>> batch = phenoData$batch should be batch = phenoData$Batch.
>>>
>>>
>>>
>>> HTH,
>>> Natasha
>>>
>>> On 09/06/2014 12:10, "Joel Ma" <jzma at unimelb.edu.au> wrote:
>>>
>>>> Hi Peter
>>>>
>>>> Thanks for the suggestions. Apologies for my coding. As you can already
>>>> tell, I have no expertise in this area of work. I have tried your
>>>> suggestions and would like some advice.
>>>>
>>>> This is what I did. Hopefully, my coding has improved abit since the last
>>>> email.
>>>>
>>>>> normData <- SCAN("*.CEL", convThreshold = 0.01, verbose = TRUE) ##
>>>>> expressionset
>>>>> edata <- exprs(normData)
>>>>> phenoData <- read.table("batchtargets1.txt", header = T) ## batch file
>>>>> batch = phenoData$batch
>>>>> mod = model.matrix(~1, data = data.frame(batch))
>>>>> combat_edata = ComBat(dat=edata, batch=batch, mod=mod, numCovs=NULL,
>>>>> par.prior=TRUE, prior.plots=TRUE)
>>>>
>>>>
>>>> Found 0 batches
>>>> Found 0  categorical covariate(s)
>>>> Standardizing Data across genes
>>>> Error in solve.default(t(design) %*% design) :
>>>>    Lapack routine dgesv: system is exactly singular: U[1,1] = 0
>>>>
>>>> I checked the forums for this error and found a reply: 'Capitalize the
>>>> "b" in the "Batch" header, and let me know if this works. Alternatively,
>>>> use the ComBat from the sva package of bioconductor. This will be less
>>>> finicky.'
>>>>
>>>> My Batch is capitalized and I used the sva package.
>>>>
>>>>
>>>> I tried to find out why it showed 0 batches when my >phenoData shows the
>>>> following table of 48 datasets (24 from each batch).
>>>>
>>>>          FileName Batch
>>>> 1  b1.CEL     1
>>>> 2  b2.CEL     1
>>>> 3  b3.CEL     1
>>>> 4     N1.CEL     1
>>>> 5     N2.CEL     1
>>>> 6     N3.CEL     1
>>>> 7    c1.CEL     1
>>>> 8    c2.CEL     1
>>>> 9    c3.CEL     1
>>>> 10   e1.CEL     1
>>>> 11   e2.CEL     1
>>>> 12   e3.CEL     1
>>>> 13 d1.CEL     1
>>>> 14 d2.CEL     1
>>>> 15 d3.CEL     1
>>>> 16    L1.CEL     2
>>>> 17    L2.CEL     2
>>>>
>>>> When I typed,
>>>>>
>>>>> batch
>>>>
>>>> NULL
>>>>
>>>> I am not sure why batch = phenoData$batch gave me a NULL. I feel I have
>>>> done something wrong here, but can't identify the issue.
>>>>
>>>> Cheers
>>>>
>>>> Joel Z Ma, PhD
>>>>
>>>> Dept. of Microbiology and Immunology
>>>> The Peter Doherty Institute for Infection and Immunity
>>>> University of Melbourne
>>>> 792 Elizabeth Street
>>>> Parkville
>>>> Victoria, 3000
>>>>
>>>> Ph: +61 3 83440775
>>>> E-mail: jzma at unimelb.edu.au
>>>>
>>>> ________________________________________
>>>> From: Peter Langfelder [peter.langfelder at gmail.com]
>>>> Sent: Monday, June 09, 2014 6:55 AM
>>>> To: Joel Ma
>>>> Cc: W. Evan Johnson; bioconductor at r-project.org
>>>> Subject: Re: [BioC] Error when applying ComBat in SCAN.UPC
>>>>
>>>> On Sun, Jun 8, 2014 at 10:24 AM, Joel Ma <jzma at unimelb.edu.au> wrote:
>>>>>
>>>>> Hi Peter
>>>>>
>>>>> Thanks for the reply.
>>>>>
>>>>> What if I used ComBat separate after SCAN.UPC? I have tried that but I
>>>>> couldn't get ComBat right.
>>>>> This is what I have typed in after going through forums on ComBat.
>>>>>
>>>>>> Sdata <- SCAN("*.CEL", convThreshold = 0.01, verbose = TRUE)
>>>>>> Bdata = ComBat(dat=Sdata, batch=batch, mod=batchtargets.txt, numCovs =
>>>>>> 3, par.prior = TRUE, prior.plots = FALSE)
>>>>>
>>>>> Error in cbind(mod, batch) : object 'batchtargets.txt' not found
>>>>>>
>>>>>> Bdata = ComBat(dat=Sdata, batch=batchtargets.txt, numCovs = 3,
>>>>>> par.prior = TRUE, prior.plots = FALSE)
>>>>>
>>>>> Error in cbind(mod, batch) : argument "mod" is missing, with no default
>>>>>>
>>>>>> Bdata <- ComBat(Sdata, sample$Batch,
>>>>>>
>>>>>> c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
>>>>>> 2,2,2,1,1,1,1,1,1,1,1,1))
>>>>>
>>>>> Error in sample$Batch : object of type 'closure' is not subsettable
>>>>
>>>>
>>>> Don't take this the wrong way, but your code simply doesn't make
>>>> sense, so much so that I don't even know what is it you are trying to
>>>> achieve, so I cannot suggest corrections.
>>>>
>>>> Here are a few suggestions:
>>>>
>>>> 1. Read the help file for SCAN (type help("SCAN") in R). It returns an
>>>> object of type ExpressionSet; to get the actual expression data from
>>>> it, use the function (method) 'exprs' on it. Call the resulting object
>>>> say 'normData'.
>>>>
>>>> 2. The object 'normData' can be fed to ComBat; read the help file and
>>>> maybe try to find some examples of use for ComBat.
>>>>
>>>> 3. For ComBat, you need the batch variable (call it, for example,
>>>> 'batch'); this is a vector with one element for each sample and you
>>>> can code the batches say 1 and 2. Make sure the order of the vector
>>>> 'batch' corresponds to the order of the samples in 'normData'.
>>>>
>>>> 4. You also need a model matrix that specifies covariates you want
>>>> ComBat to take into account. If you have none, use the argument mod =
>>>> model.matrix(~1, data = data.frame(batch)).
>>>>
>>>> 5. Run ComBat and see if you get any errors; if you do, copy and paste
>>>> the relevant part of the error into your favorite search engine and
>>>> read the suggestions on how to solve it.
>>>>
>>>> 6. If you still can't get it to work, email the list again.
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>> Peter
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>
>>> Natasha Sahgal | Postdoctoral Research Assistant
>>> Centre for Molecular Oncology
>>> Barts Cancer Institute - a Cancer Research UK Centre of Excellence
>>> Queen Mary, University of London
>>> John Vane Science Centre, Charterhouse Square, London EC1M 6BQ
>>> Tel: +44 (0)20 7882 3560 | Fax: +44 (0)20 7882 3884 |
>>> www.bci.qmul.ac.uk/research/centre-profiles/molecular-oncology.html
>>>
>>>
>>>
>>>
>>> This email may contain information that is privileged, confidential or
>>> otherwise protected from disclosure.
>>> It must not be used by, or its contents copied or disclosed to, persons
>>> other than the addressee.
>>> If you have received this email in error please notify the sender
>>> immediately and delete the email. This message has been scanned for viruses.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list