[BioC] limma - batch dates incorrporated into t-test

James W. MacDonald jmacdon at uw.edu
Thu Sep 12 18:44:31 CEST 2013


Hi Helen,

On Thursday, September 12, 2013 11:50:44 AM, Helen Smith wrote:
> That seems to have all worked perfectly thank you so much!
> Just to double check: my batch dates aren't a contrast but have been incorporated into the t-test as a factor using the following code after you contrast line:

<pedantic>

Your batch dates haven't been used to compute a contrast, but have been 
incorporated in the linear model to account for any date-specific batch 
effects.

</pedantic>

If you look at the contrast matrix, there will be all zeros for rows 
5-11, indicating that the batches are never used in any comparison.

>
> ###Contrast matrix###
> contrast <- makeContrasts(oocyte - four_cell,oocyte - eight_cell,oocyte - blastomere,oocyte - blastocyst,four_cell - eight_cell,four_cell - blastomere,four_cell - blastocyst,eight_cell - blastomere,eight_cell - blastocyst,blastomere - blastocyst, levels =design)
>
> ###Fit linear model###
> fit <- lmFit(eset, design)
> names(fit)
> fit2 <- contrasts.fit(fit, contrast)
> fit2 <- eBayes(fit2)
> write.csv(fit2,"Limma_fit2.csv")

Wait, what? That shouldn't work. And it is almost always a bad idea to 
just write things out at this stage. You spent all this time getting 
your data into R and fitting a model, now why ruin things by dumping 
into Excel (I know what you are planning Helen, and that way will bring 
only tears).

You can now use limma to get out tables of the top hits for each 
contrast (topTable()), or you can create Venn diagrams showing the 
overlap or lack thereof between different contrasts (shameless plug - 
the affycoretools package may be useful here as well). You can do 
hypergeometric tests or GSEA with your top hits (topGO, GOstats, 
Category, GSEAbase, or the romer, roast, camera functions in limma). Or 
you can make sweet HTML tables of your top hits that you can share with 
collaborators (ReportingTools). Or you can get super cool and learn to 
use knitr to create pdf or HTML output and amaze your friends and more 
importantly, your boss.

But if you give up here and import your data into the ghetto that is 
known as Excel, you will lose all that coolness. And if you annotate 
your data first, you will almost surely convert gene names like Apr1 to 
dates like 4/1/2013 (or 2013/04/01? Does Excel know it is in England 
when you use it?), because that's how Excel rolls.

I know R and BioC have steep learning curves, but if you are going to 
be analyzing Affy data, it is worth it to get past the growing pains.

Anyway, I thought you smashed your computer.

Best,

Jim




>
>
> Thanks again,
> Helen
>
>
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu]
> Sent: 12 September 2013 16:16
> To: Helen Smith
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] limma - batch dates incorrporated into t-test
>
> Ugh, tripped up by syntactically invalid R names! That's why I removed those nasty numbers from 4cell and 8cell, but still and yet got tripped up...
>
> So the dates are what we call 'nuisance variables', which are things you want to account for, but have no real interest for you. Because of this we can call them anything we want, so long as R will accept them.
> And what R wants is something that doesn't start with a number, and doesn't have math symbols in it (which is the problem with e.g., Date02/07/2013).
>
> And I think there is a problem with the gsub() line to get rid of the prepended cruft that R likes to add to model.matrix colnames. So how about
>
> colnames(design) <- gsub("Target", "", colnames(design))
>
> then you need to change the colnames that have Date in them as well.
> These should be columns 5-11. So you can just do
>
> colnames(design)[5:11] <- paste0("Date", 1:7) ## that's paste with a Zero, not capital O
>
> You should check the colnames just to make sure I didn't count wrong.
>
> And you should probably smash your computer against the wall anyway.
> They make such a pretty tinkly noise when they shatter. ;-D
>
> Best,
>
> Jim
>
>
>
> On Thursday, September 12, 2013 10:26:55 AM, Helen Smith wrote:
>> Thanks James, that all worked great apart from the last line:
>>
>> contrast <- makeContrasts(blastomere - oocyte, blastomere -
>> eight_cell, blastomere - four_cell, <added all my contrasts here>,
>> levels =design)
>>
>> Which produced the following error:
>>
>> /Error in makeContrasts(oocyte - four_cell, oocyte - eight_cell,
>> oocyte -  : /
>>
>> /  The levels must by syntactically valid names in R, see
>> help(make.names).  Non-valid names:
>> Date02/07/2013,Date06/03/2013,Date10/01/2008,Date13/02/2007,Date16/02/
>> 2007,Date23/03/2007,Date24/10/2007/
>>
>> //
>>
>> Ideas?
>>
>> Thank you, you have turned a frustrating day into one in which I can
>> see light at the end of the tunnel…….
>>
>> -----Original Message-----
>> From: James W. MacDonald [mailto:jmacdon at uw.edu]
>> Sent: 12 September 2013 15:01
>> To: Helen Smith
>> Cc: bioconductor at r-project.org
>> Subject: Re: [BioC] limma - batch dates incorrporated into t-test
>>
>> Hi Helen,
>>
>> On Thursday, September 12, 2013 9:02:25 AM, Helen Smith wrote:
>>
>>> Hi All,
>>
>>>
>>
>>> I am new to limma and R programming.
>>
>>>
>>
>>> I have the CEL.files uploaded, normalised as an expression set, and
>> have put together the target.txt file containing the following layout:
>>
>>> Name    FileName             Target   Date
>>
>>> 5              0207_DaB7_H_ea_Blast1.CEL     blastocyst
>>    13/02/2007
>>
>>> 6              0207_DaB8_H_ea_Blast2.CEL     blastocyst
>> 13/02/2007
>>
>>> 4              0207_DaB6_H_ea_4cell_r3.CEL  4cell       16/02/2007
>>
>>> 16           0307_DaB1_H_ea_oocyte_r1.CEL             oocyte  23/03/2007
>>
>>> 17           0307_DaB3_H_ea_oocyte_r3.CEL             oocyte  23/03/2007
>>
>>> 18           0307_DaB5_H_ea_4cell_r2.CEL  4cell       23/03/2007
>>
>>> 21           1007_DaB4_H_ea_4cell_r1.CEL  4cell       24/10/2007
>>
>>> 22           1007_DaB9_H_ea_Blast3.CEL     blastocyst
>> 24/10/2007
>>
>>> 23           1007_DaB21_H_ea_oocyte9856.CEL        oocyte  24/10/2007
>>
>>> 24           1007_DaB22_H_ea_oocyte9858.CEL        oocyte  24/10/2007
>>
>>> 1              0108_DaB18_H_ea_4cell_0.9_r4.CEL       4cell
>> 10/01/2008
>>
>>> 2              0108_DaB20_H_ea_4cell_0.9_r6.CEL       4cell
>> 10/01/2008
>>
>>> 3              0108_DaB23_H_ea_Blast_2.5_r4.CEL
>> blastocyst            10/01/2008
>>
>>> 10           0213_SueK(2)4_H_ea_005_B4.CEL
>> blastomere         01/03/2013
>>
>>> 14           0213_SueK(2)8_H_ea_005_B8.CEL
>> blastomere         01/03/2013
>>
>>> 15           0213_SueK(2)10_H_ea_002_8CE.CEL      8cell       01/03/2013
>>
>>> 7              0213_SueK(2)1_H_ea_005_B1.CEL
>>    blastomere         06/03/2013
>>
>>> 8              0213_SueK(2)2_H_ea_005_B2.CEL
>> blastomere         06/03/2013
>>
>>> 9              0213_SueK(2)3_H_ea_005_B3.CEL
>> blastomere         06/03/2013
>>
>>> 11           0213_SueK(2)5_H_ea_005_B5.CEL
>>       blastomere         06/03/2013
>>
>>> 12           0213_SueK(2)6_H_ea_005_B6.CEL
>> blastomere         06/03/2013
>>
>>> 13           0213_SueK(2)7_H_ea_005_B7.CEL
>> blastomere         06/03/2013
>>
>>> 19           0713_Suek(3)2_Ep_H_8C_006_11.CEL    8cell       02/07/2013
>>
>>> 20           0713_Suek(3)3_Ep_H_8C_004_11.CEL    8cell       02/07/2013
>>
>>>
>>
>>>
>>
>>> But am struggling with setting up the design matrix and contrasts.
>>> My
>>
>>> limited programming ability is probably playing the largest role,
>>> and
>>
>>> I feel like smashing the computer against the wall (probably a sign
>>> of
>>
>>> slight frustration )
>>
>> It's really quite simple ;-D. You just make a design matrix that
>> includes the dates, and then make a contrasts matrix that doesn't use
>> the dates to make comparisons.
>>
>> The simplest way to do this is to use model.matrix() and makeContrasts().
>>
>> Assume you read in your targets.txt file like so:
>>
>> targets <- read.table("targets.txt", header=TRUE, stringsAsFactors =
>>
>> FALSE)
>>
>> we have to do the stringsAsFactors business because your 4cell and
>> 8cell sample designations won't work with makeContrasts() (they cannot
>> start with a number).
>>
>> targets$Targets <- factor(gsub("4", "four_", gsub("8", "eight_",
>>
>> targets$Targets)))
>>
>> then
>>
>> design <- model.matrix(~0+Target+Date, targets)
>>
>> Now you probably have column names with an extra 'targets' in the
>> name, which nobody likes, so we remove
>>
>> colnames(design) <- gsub("targets", "", colnames(design))
>>
>> contrast <- makeContrasts(blastomere - oocyte, blastomere -
>> eight_cell, blastomere - four_cell, <add all the contrasts you want
>> here>, levels =
>>
>> design)
>>
>> and now you proceed on to the lmFit/contrasts.fit/eBayes steps.
>>
>> Best,
>>
>> Jim
>>
>>>
>>
>>> The batch dates are included in the Target file, as I would like to
>>
>>> add them as a factor when computing the T-test. But this isn’t a
>>
>>> contrast I would like to make on its own (does that make sense?)
>>
>>>
>>
>>> I would like to make all possible combinations of contrasts, listed
>> as targets.
>>
>>>
>>
>>> Any help would be fantastic!!!!!
>>
>>> Thank you in advance,
>>
>>> Helen
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>>              [[alternative HTML version deleted]]
>>
>>>
>>
>>>
>>
>>>
>>
>>> _______________________________________________
>>
>>> Bioconductor mailing list
>>
>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>>> Search the archives:
>>
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>>
>> James W. MacDonald, M.S.
>>
>> Biostatistician
>>
>> University of Washington
>>
>> Environmental and Occupational Health Sciences
>>
>> 4225 Roosevelt Way NE, # 100
>>
>> Seattle WA 98105-6099
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list