[BioC] Difficulty with limma contrast matrix creation

James W. MacDonald jmacdon at uw.edu
Fri Apr 6 17:34:48 CEST 2012


Hi Brian,

First off, please keep this conversation on list. We would like the list 
archives to be a repository of information, and if questions and answers 
get taken off list, that goal is not met.

On 4/6/2012 10:15 AM, b1gorsuch at comcast.net wrote:
> Jim,
> Thank you very much.  I apologize for my ignorance on the subject, I 
> am trained biologist/immunologist and medical provider no attempting 
> my dissertation work in bioinformatics with little background except 
> the PHD didactic work.
>
> I did not even get to designing the matrix, as I am not sure how to 
> construct.  I have the below agilent microarrays derived from the 
> agilent feature extraction software in raw data form, single channel.  
> I have Two mouse strains (c2fb and WT) which both have been treated 
> with STZ and heart harvested at day 4 and day 14 post treatment.  The 
> WT have also been treated with vehicle (CBS) at day 4 and 14.  No c2fb 
> strain mice however were treated with baseline vehicle CBS (doing 
> presently).

That is a problem. If you are doing one set of samples separately, and 
will then process the chips for those samples separately, you have 
completely confounded technical and biological variability.

In other words, if you find a difference between e.g., c2fb treated vs 
c2fb control mice, you will not be able to say whether that difference 
is due to differential expression of the gene(s), or is simply due to 
some uncontrolled technical variability between processing of the chips, 
or treatment of the mice. So running those c2fb control chips will 
likely be a waste of money.


> So I have (2) strains (2) treatments [2 applied to one strain and only 
> 1 applied to other] and (2) day intervals [4 and 14].  So I this would 
> be a 3x2x2 factorial analysis (except one strain was only treated with 
> one treatment)?.
>
> I had tried:
> f<-factor(targets$Genotype, targets$Treatment, targets$Time.d, 
> levels=unique(targets$Genotype))
> *this did not work though.(was based on 
> http://matticklab.com/index/php?title=single_channel_analysis_of_agilent_microarray)
>
>
> I also tried to Follw Dr. Gordon Smyth's tutorial on 2x2 factorial 
> analysis:
>
> f<-paste(targets$Genotype, targets$Treatment, targets$Time.d, sep="")
> *this did also not work
>
> I initially had my targets.txt file with condition only column (ie. 
> C2fb_STZ_4d) combining all descriptive data into one to try and make 
> it easier, but also had problems with this.

I think you are approaching this question from the wrong perspective, 
getting caught up in all this statistical blahblahblah, especially if 
you don't have statistical training.

It is much easier to start by stating what the original hypothesis of 
the experiment was, and deciding what comparisons are of interest to 
you. Once you know what samples/times/treatments or combinations thereof 
you want to compare, you can decide what model coefficients are 
necessary to make those comparisons. This will dictate your design 
matrix as well as the contrasts matrix.

However, you might still have some complications depending on how many 
comparisons you want to make. You don't have much replication for an 
experiment with three factor levels, so you may not be able to calculate 
all the coefficients you are interested in, at least not in a form that 
will be simple to interpret. If you want to make a whole bunch of 
comparisons, you may need to estimate coefficients that are internally 
already a comparison. As an example, see the two tables on p49 of the 
limma User's Guide, specifically the Comparison columns. This makes 
figuring out the contrasts matrix that much harder.

If you are going to need to do that sort of thing, then you will be much 
better off contacting a local statistician for help. There is no profit 
in struggling through this stuff yourself, especially if you are not 
sure at the end that you did things correctly.

Best,

Jim
>
> Thank you,
> Brian
>
> ------------------------------------------------------------------------
> *From: *"James W. MacDonald" <jmacdon at uw.edu>
> *To: *"Brian Gorsuch [guest]" <guest at bioconductor.org>
> *Cc: *bioconductor at r-project.org, gorsucwi at umdnj.edu
> *Sent: *Friday, April 6, 2012 6:05:05 AM
> *Subject: *Re: [BioC] Difficulty with limma contrast matrix creation
>
> Hi Brian,
>
> On 4/5/2012 10:54 PM, Brian Gorsuch [guest] wrote:
> > Dear members,
> > I would be very grateful for any assistance.  I am having 
> difficulites with creating a contrast matrix for my data in limma, as 
> well as then applying the matrix to the modeled data to compute 
> statistics, and the output for such.
> >
> > I have included my targets.txt file.  I was attempting to follow the 
> tutorial "single channel analysis of agilent microarray data with 
> limma" by the mattick lab.  Unfortunelty mine is not a simple 2x2 
> factorial matrix.
> >
> > Thank you very much for any suggestions, and your time in doing so.
>
> The contrast is dependent on the design matrix, which specifies what
> coefficients you are computing (and hence the interpretation of the
> coefficients). Without knowing your goals and the design matrix you are
> using, it is impossible to give any advice. Perhaps you could elaborate
> a bit?
>
> Best,
>
> Jim
>
>
> >
> > FileName        Genotype        Treatment        Time.d        Sample
> > 
> US45102885_252665511314_S01_GE1-v5_95_Feb07_1_4.txt        C2fb        STZ        4        V237
> > 
> US45102885_252665511314_S01_GE1-v5_95_Feb07_1_3.txt        C2fb        STZ        4        V236
> > 
> US45102885_252665511333_S01_GE1-v5_95_Feb07_1_1.txt        C2fb        STZ        4        V238
> > 
> US45102885_252665511333_S01_GE1-v5_95_Feb07_1_2.txt        C2fb        STZ        14        V242
> > 
> US45102885_252665511333_S01_GE1-v5_95_Feb07_1_4.txt        C2fb        STZ        14        V244
> > 
> US45102885_252665511333_S01_GE1-v5_95_Feb07_1_3.txt        C2fb        STZ        14        V243
> > 
> US45102885_252665511310_S01_GE1-v5_95_Feb07_1_1.txt        WT        CBS        4        V218
> > 
> US45102885_252665511310_S01_GE1-v5_95_Feb07_1_2.txt        WT        CBS        4        V219
> > 
> US45102885_252665511310_S01_GE1-v5_95_Feb07_1_3.txt        WT        CBS        4        V220
> > 
> US45102885_252665511310_S01_GE1-v5_95_Feb07_1_4.txt        WT        CBS        14        V227
> > 
> US45102885_252665511311_S01_GE1-v5_95_Feb07_1_1.txt        WT        CBS        14        V228
> > 
> US45102885_252665511311_S01_GE1-v5_95_Feb07_1_2.txt        WT        CBS        14        V229
> > 
> US45102885_252665511311_S01_GE1-v5_95_Feb07_1_3.txt        WT        STZ        4        V224
> > 
> US45102885_252665511311_S01_GE1-v5_95_Feb07_1_4.txt        WT        STZ        4        V225
> > 
> US45102885_252665511312_S01_GE1-v5_95_Feb07_1_1.txt        WT        STZ        4        V226
> > 
> US45102885_252665511312_S01_GE1-v5_95_Feb07_1_2.txt        WT        STZ        14        V231
> > 
> US45102885_252665511312_S01_GE1-v5_95_Feb07_1_3.txt        WT        STZ        14        V239
> > 
> US45102885_252665511312_S01_GE1-v5_95_Feb07_1_4.txt        WT        STZ        14        V240
> >
> >
> >   -- output of sessionInfo():
> >
> > we
> >
> > --
> > Sent via the guest posting facility at bioconductor.org.
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list