[BioC] limma separate channel analysis of your data

Fri Sep 14 14:26:35 CEST 2012

Dear Gordon,

Again, thank you so much for your assistance. This will get me going.

Best regards,

Osee
________________________________________
From: Gordon K Smyth [smyth at wehi.EDU.AU]
Sent: Thursday, September 13, 2012 5:57 PM
To: Sanogo, Yibayiri O
Cc: Osee Sanogo; Bioconductor mailing list
Subject: RE: limma separate channel analysis of your data

Dear Osee,

Yes, your code is correct.  I forgot to add the intraspotCorrelation
estimation.

Best wishes
Gordon

On Thu, 13 Sep 2012, Sanogo, Yibayiri O wrote:

> Dear Gordon,
>
> In the code below,
>
> design <- model.matrix(~Dye+Fish+Brain+Brain:Treat)
> fit <- lmscFit(MA,design)
> fit <- eBayes(fit, trend=TRUE)
>
> shouldn't we include the intracorrelation coefficient in the single channel model?
> Is the following code correct?
>
> corfit <- intraspotCorrelation(MA, design)
> fit <- lmscFit(MA,design, correlation=corfit$consensus)
>
> Please let me know if this is the right way of doing it.
>
> Thank you.
>
> Osee
>
>
> ________________________________________
> From: Gordon K Smyth [smyth at wehi.EDU.AU]
> Sent: Tuesday, September 11, 2012 8:13 PM
> To: Sanogo, Yibayiri O
> Cc: Osee Sanogo; Bioconductor mailing list
> Subject: limma separate channel analysis of your data
>
> Dear Osee,
>
> Thank you for attaching the revised design of your experiment. I found it
> somewhat tricky to analyse, so my recommendations are more involved than I
> usually give. I think that you will need to perform a separate channel
> analysis, otherwise it will not be possible to make baseline comparisons
> between the brain parts.
>
> I also think that you may need to re-normalize your data using the limma
> package. You say that the data has been "loess/scale normalized into an
> expression set", but I don't think that scale normalization should be
> necessary for Agilent data, and an expression set is not fully suitable
> for two colour data. The following code assumes that you have normalized
> the data into an MAList object using the limma package. I will call it
> 'MA'. I strongly recommend that you use normexp background correction and
> loess normalization as described in the limma User's Guide for GenePix
> data.
>
> The first step is to put your targets data into separate channel format:
>
> targets <- read.delim("Design_Brain.txt")
> Treat <- as.vector(t(targets[,c("Cy3","Cy5")]))
> Treat <- ifelse(Treat>0,"Exp","Ctrl")
> Treat <- factor(Treat)
> Brain <- rep(targets$Brain.Part,each=2,len=48)
> Fish <- factor(rep(targets$Fish.Number,each=2,len=48))
> Dye <- rep(0:1,len=48)
> data.frame(Fish,Brain,Treat,Dye)
>
> Now you can fit a linear model, correcting for probe-specific dye-bias,
> correcting for any differences between the fish, and computing Treatment
> vs Control effects for each brain part:
>
> design <- model.matrix(~Dye+Fish+Brain+Brain:Treat)
> fit <- lmscFit(MA,design)
> fit <- eBayes(fit, trend=TRUE)
>
> Now you can extract genes that have a treatment effect. To find genes
> that have a treatment vs control effect in telencephalon, you would use:
>
> topTable(fit, coef="BrainT:TreatExp")
>
> For genes DE for treatment vs control in brain stem, it would be
>
> topTable(fit, coef="BrainBS:TreatExp")
>
> And so on.
>
> Best wishes
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> http://www.statsci.org/smyth
>
>
> On Tue, 11 Sep 2012, Sanogo, Yibayiri O wrote:
>
> Dear Gordon,
>
> Attached please find the revised design of my experiment.
>
> Thank you for your willingness to help me analyze these data.
>
> Best regards,
>
> Osee
> ________________________________________
> From: Sanogo, Yibayiri O
> Sent: Monday, September 10, 2012 9:51 PM
> To: Gordon K Smyth; Osee Sanogo
> Cc: Bioconductor mailing list
> Subject: RE: Advice with RemoveBatchEffect and Rank Product package
>
> Dear Gordon,
>
> Please ignore the last two columns (weight and length) of the design for
> now: I think they are screwed up due to some kind of sorting. I will have
> to double check them (look in the notebook which is not here now). Only
> different brain regions (T, D, C, BS) belonging to the same fish should
> have same weight, length.
>
> The other columns are:
>
> Fish.Number= corresponds to fish ID, the individual fish (biological
> replicates) Slide= the Agilent 4X44k slides on each the arrays were
> located Brain.Part= the different brain regions that were dissected and
> hybridized (within region). They were T (telencephalon), D (diencephalon),
> C (cerebellum), BS (brain stem). in the cy3, c5 columns, "-1" is control
> and "1" is experimental.
>
> FileName Cy3 Cy5 Fish Number Slide Brain Part
> 1T.gpr 1 -1 1 2 T
> 2T.gpr -1 1 2 1 T
> 3T.gpr 1 -1 3 4 T
> 4T.gpr -1 1 4 3 T
> 5T.gpr 1 -1 5 6 T
> 6T.gpr -1 1 6 5 T
> 1D.gpr -1 1 1 5 D
> 2D.gpr 1 -1 2 4 D
> 3D.gpr -1 1 3 1 D
> 4D.gpr 1 -1 4 6 D
> 5D.gpr -1 1 5 3 D
> 6D.gpr 1 -1 6 2 D
> 1C.gpr 1 -1 1 4 C
> 2C.gpr -1 1 2 3 C
> 3C.gpr 1 -1 3 6 C
> 4C.gpr -1 1 4 5 C
> 5C.gpr 1 -1 5 2 C
> 6C.gpr -1 1 6 1 C
> 1BS.gpr -1 1 1 1 BS
> 2BS.gpr 1 -1 2 2 BS
> 3BS.gpr -1 1 3 3 BS
> 4BS.gpr 1 -1 4 4 BS
> 6BS.gpr 1 -1 6 6 BS
>
> Thank you for your willingness to help. I will stay up a bit so that I can
> respond faster to your questions given the time difference between us.
>
> Thank you so much.
>
> Osee
>
> ________________________________________
> From: Gordon K Smyth [smyth at wehi.EDU.AU]
> Sent: Monday, September 10, 2012 6:34 PM
> To: Osee Sanogo
> Cc: Sanogo, Yibayiri O; Bioconductor mailing list
> Subject: Re: Advice with RemoveBatchEffect and Rank Product package
>
> Dear Osee,
>
> Can you explain what the columns Fish.Number, Slide, Brain.Part, Weight
> and Length mean in your experiment? If the first six arrays are from
> biologically independent samples, why do they have the same weight and
> length?
>
> Best wishes
> Gordon
>
>
> On Mon, 10 Sep 2012, Osee Sanogo wrote:
>
>> Dear Gordon,
>>
>> From what you said, it seems that I am oversimplifying my experiment by
>> attempting to analyze it with RankProd, which doesn't offer the option for
>> complex modeling.
>>
>> Could you please explain to me how I could analyze the experiment using
>> Limma?
>> Please let me know if you'd like me to provide further details of the
>> experiment.
>>
>> Thank you so much.
>>
>> Osee
>>
>>
>> On 9/10/12 2:04 AM, "Gordon K Smyth" <smyth at wehi.EDU.AU> wrote:
>>
>>> Dear Osee,
>>>
>>> No, you can't use removeBatchEffect to control for dye bias.
>>>
>>> Can you ignore the dye effect? Not in general, but who knows?
>>>
>>> Your experiment seems too complex to be properly analysed using RankProd.
>>> For one thing, it seems clear that you have obtained multiple parts of the
>>> brain from the same biological replicates, meaning that your samples are
>>> paired by fish number.
>>>
>>> I could explain how to analyse this experiment using limma. However, if
>>> you are determined that you will use RankProd, it might be best to email
>>> the authors of that package for advice.
>>>
>>> Best wishes
>>> Gordon
>>>
>>> ---------------------------------------------
>>> Professor Gordon K Smyth,
>>> Bioinformatics Division,
>>> Walter and Eliza Hall Institute of Medical Research,
>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>> http://www.statsci.org/smyth
>>>
>>> On Sun, 9 Sep 2012, Osee Sanogo wrote:
>>>
>>>> Dear Gordon,
>>>>
>>>> Thank you for getting back to me about my questions.
>>>>
>>>> My experiment is trying to identify differentially expressed genes in four
>>>> regions of the brain in response to a stressor. I have 6 biol. replicates in
>>>> each brain region for the control and experimental groups in each region,
>>>> and the comparison is being done within brain region (i.e., T control vs T
>>>> exp, D ctrl vs D exp, C ctrl vs C exp, BS ctrl vs BS exp). The sample were
>>>> run in two-color Agilent Array.
>>>>
>>>> You're right that the design I sent was from the separate channel analysis,
>>>> in which I attempted to account for array and dye effect, and then run the
>>>> data in RankProd. Now I know that this is not right. Ok, I will use the
>>>> single channel analysis in Limma.
>>>>
>>>> I still would like to run the two-channel data (ratios) in RankProd, as my
>>>> previous experience found this useful for my dara (low replicate numbers).
>>>>
>>>> My questions: 1) Could I use RemoveBatchEffect to control for dye bias
>>>> before running the two-channel data in RankProd? If yes, how should I do
>>>> this using the RemoveBatch Effect function?
>>>> 2) I found that about 3% of my probes have dye effect. Can I
>>>> omit controlling for dye effect without compromising the result of my
>>>> analysis?
>>>>
>>>> The data were loess/scale normalized into an expression set (Data_RP).
>>>>
>>>> Here is the design of the experiment
>>>>
>>>> FileName Cy3 Cy5 Fish.Number Slide Brain.Part Weight Length
>>>> 1 1T.gpr 1 -1 1 2 T 39 0.63
>>>> 2 2T.gpr -1 1 2 1 T 39 0.63
>>>> 3 3T.gpr 1 -1 3 4 T 39 0.63
>>>> 4 4T.gpr -1 1 4 3 T 39 0.63
>>>> 5 5T.gpr 1 -1 5 6 T 39 0.63
>>>> 6 6T.gpr -1 1 6 5 T NA NA
>>>> 7 1D.gpr -1 1 1 5 D 47 1.21
>>>> 8 2D.gpr 1 -1 2 4 D 47 1.21
>>>> 9 3D.gpr -1 1 3 1 D 47 1.21
>>>> 10 4D.gpr 1 -1 4 6 D 47 1.21
>>>> 11 5D.gpr -1 1 5 3 D 47 1.21
>>>> 12 6D.gpr 1 -1 6 2 D NA NA
>>>> 13 1C.gpr 1 -1 1 4 C 47 1.31
>>>> 14 2C.gpr -1 1 2 3 C 47 1.31
>>>> 15 3C.gpr 1 -1 3 6 C 47 1.31
>>>> 16 4C.gpr -1 1 4 5 C 47 1.31
>>>> 17 5C.gpr 1 -1 5 2 C 47 1.31
>>>> 18 6C.gpr -1 1 6 1 C NA NA
>>>> 19 1BS.gpr -1 1 1 1 BS 89 1.44
>>>> 20 2BS.gpr 1 -1 2 2 BS 89 1.44
>>>> 21 3BS.gpr -1 1 3 3 BS 89 1.44
>>>> 22 4BS.gpr 1 -1 4 4 BS NA NA
>>>> 23 5BS.gpr -1 1 5 5 BS NA NA
>>>> 24 6BS.gpr 1 -1 6 6 BS NA NA
>>>>
>>>> Thank you for your help and please let me know if you need further
>>>> explanation of the experiment.
>>>>
>>>> Best regards,
>>>>
>>>> Osee
>>>>
>>>>>
>>>>
>>>>
>>>> On 9/9/12 7:24 PM, "Gordon K Smyth" <smyth at wehi.EDU.AU> wrote:
>>>>
>>>>> Dear Osee,
>>>>>
>>>>> You are attempting to do a number of things that don't seem correct to me.
>>>>>
>>>>> First, you seem to attempting a separate channel analysis of two color
>>>>> microarray data, but ignoring the pairing of the red and green channels.
>>>>> It isn't correct to do this. I don't see any way to use RankProd, or any
>>>>> other package designed for independent samples, correctly in this context.
>>>>> If you must do a separate channel analysis, you would be better off using
>>>>> the separate channel analysis facilities of the limma package.
>>>>>
>>>>> Second, when you set batch=rep(1,24), you are defining a batch that
>>>>> consists of every array in your data set. Obviously it doesn't make sense
>>>>> to remove batch effects unless there are at least two batches.
>>>>>
>>>>> Third, I don't follow your design matrix.
>>>>>
>>>>> It would be better to go back to the start, and describe in more basic
>>>>> terms what is the nature of your data and what comparison you are trying
>>>>> to make.
>>>>>
>>>>> Best wishes
>>>>> Gordon
>>>>>
>>>>>> Date: Sat, 8 Sep 2012 11:40:45 +0000
>>>>>> From: "Sanogo, Yibayiri O" <sanogo at illinois.edu>
>>>>>> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
>>>>>> Subject: [BioC] Advice with RemoveBatchEffect and Rank Product package
>>>>>>
>>>>>> Dear Members of the list,
>>>>>>
>>>>>> (I apologize for posting this again -I sent it earlier to the list but
>>>>>> from another account and I was listed me as non-Member -and Member I am
>>>>>> since 2008:-)).
>>>>>>
>>>>>> I have been using Rank Prod to analyze Agilent two-color data. However, I
>>>>>> would like to remove the dye effect prior to analysis. I read on the forum
>>>>>> that RemoveBatchEffect should be used in the Limma linear model, a type of
>>>>>> modeling that is not in Rank Product.
>>>>>>
>>>>>> I have two questions:
>>>>>>
>>>>>> 1) Would it be appropriate to use RemoveBatchEffect to correct for dye
>>>>>> effect prior to running the expression data using Rank Prod?
>>>>>>
>>>>>> 2) a) If no, what other function could I use to do this?
>>>>>> b) If yes, I would like a help with the correct design and how to
>>>>>> properly indicate the batch.
>>>>>>
>>>>>> Here is my design indicating the two dyes (cy3=-1, cy5=1; T, D, C, BS =are
>>>>>> different areas of the brain):
>>>>>>
>>>>>> design1
>>>>>> BS C D T
>>>>>> 1 0 0 0 1
>>>>>> 2 0 0 0 -1
>>>>>> 3 0 0 0 1
>>>>>> 4 0 0 0 -1
>>>>>> 5 0 0 0 1
>>>>>> 6 0 0 0 -1
>>>>>> 7 0 0 -1 0
>>>>>> 8 0 0 1 0
>>>>>> 9 0 0 -1 0
>>>>>> 10 0 0 1 0
>>>>>> 11 0 0 -1 0
>>>>>> 12 0 0 1 0
>>>>>> 13 0 1 0 0
>>>>>> 14 0 -1 0 0
>>>>>> 15 0 1 0 0
>>>>>> 16 0 -1 0 0
>>>>>> 17 0 1 0 0
>>>>>> 18 0 -1 0 0
>>>>>> 19 -1 0 0 0
>>>>>> 20 1 0 0 0
>>>>>> 21 -1 0 0 0
>>>>>> 22 1 0 0 0
>>>>>> 23 -1 0 0 0
>>>>>> 24 1 0 0 0
>>>>>>
>>>>>> attr(,"assign")
>>>>>> [1] 1 1 1 1
>>>>>>
>>>>>> I've tried this (Data_RP are my data, the M values of the expression set):
>>>>>>
>>>>>> DYE_RP<-removeBatchEffect(Data_RP, batch=rep(1,24), batch2=NULL,
>>>>>> design=design1)
>>>>>>
>>>>>> but it is returning an error message
>>>>>> " Error in contr.sum(levels(batch)) :
>>>>>> not enough degrees of freedom to define contrasts"
>>>>>>
>>>>>> Please help me correct this code.
>>>>>>
>>>>>> Thank you so much for your help.
>>>>>>
>>>>>> Osee
>>>>>>
>>>>>> -- -- --
>>>>>> Y. Osee Sanogo
>>>>>> Integrative Biology
>>>>>> Institute for Genomic Biology
>>>>>> University of Illinois at Urbana
>>>>>> 505 S. Goodwin Ave
>>>>>> Urbana, IL-61801
>>>>>>
>>>>>> Tel: 217-333 2308 (Office)
>>>>>> 217-417 9593 (Cell)
>
> ______________________________________________________________________
> The information in this email is confidential and intended solely for the addressee.
> You must not disclose, forward, print or use it without the permission of the sender.
> ______________________________________________________________________
>

______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________