[BioC] edgeR Design Matrix

James W. MacDonald jmacdon at u.washington.edu
Tue Dec 31 00:59:18 CET 2013


Hi Reema,

I don't know what strain means in this context, but if the KO samples 
both have knocked out genes and are different strains from the control 
samples, then you have at least partially aliased strain and KO. In 
other words, if you compare strain1_KO vs strain3_Control, you can't say 
if the differences are due to the knocked out gene or are simply due to 
differences between strain.

If you fit the model below (with Design <- 
model.matrix(~Condition+Straing)), then you can determine genes that are 
different between KO and Control after adjusting for strain. But you 
cannot detect any strain specific differences.

Best,

Jim


On 12/30/2013 3:41 PM, Reema Singh wrote:
> HI Jim,
>
> I am really sorry, I made a typing mistake in my table. I actually 
> have four different strain. This is the actual table:-
>
> *1) Sample table for design matrix*
>
> Sample                 Strain                    Condition
>
> KO1-A                   1                              KO
>
> KO1-B                   1                              KO
>
> KO2-A                   2                              KO
>
> KO2-B                   2                              KO
>
>      Cont1-A               3                   Control
>
> Cont1-B                3                             Control
>
>      Cont2-A               4                   Control
>
> Cont2-B                4                             Control
>
>
> In this case is it fine to use factor of strain like this = ( Straing 
> <- factor(target$Strain)?
>
> Regards
>
>
> On Mon, Dec 30, 2013 at 12:27 AM, James W. MacDonald <jmacdon at uw.edu 
> <mailto:jmacdon at uw.edu>> wrote:
>
>     Hi Reema,
>
>     You only have two strains, right? In your original email it seemed
>     like that was what you indicated. If so, then you want a factor
>     that equals one if strain 1 and two if the other strain. Then you
>     should have a single column of your design matrix that is a zero
>     for one strain and a one if the other.
>
>     So something like
>
>     strain <- factor(rep(1:2, each=2, times=2))
>
>     Best,
>
>     Jim
>
>     On Dec 29, 2013 10:09 AM, "Reema Singh" <reema28sep at gmail.com
>     <mailto:reema28sep at gmail.com>> wrote:
>
>         Hi Jim,
>
>         My apologies, I didn't mean to take this offline. But next
>         time,I will keep this in mind.
>
>         Thank you so much for your clarification on this design
>         matrix. It really helps me a lot. So for finding DEG after
>         taking strain variability into consideration, I should follow
>         this:
>
>         Straing <- factor(target$Strain)
>
>         Condition <- target$Condition
>
>         Design <- model.matrix(~Condition+Straing)
>
>         Design
>
>         (Intercept) ConditionKO Straing2 Straing3 Straing4
>
>         111000
>
>         211000
>
>         311100
>
>         411100
>
>         510010
>
>         61001 0
>
>         710001
>
>         810001
>
>         attr(,"assign")
>
>         [1] 0 1 2 2 2
>
>         attr(,"contrasts")
>
>         attr(,"contrasts")$Condition
>
>         [1] "contr.treatment"
>
>         attr(,"contrasts")$Straing
>
>         [1] "contr.treatment"
>
>         diff <- glmLRT(fit,coef=2)
>
>
>         Regards
>
>
>
>         On Sat, Dec 28, 2013 at 10:31 PM, James W. MacDonald
>         <jmacdon at uw.edu <mailto:jmacdon at uw.edu>> wrote:
>
>             Hi Reema,
>
>             Please don't take conversations off list.
>
>             In your second parameterization you are pretty close,
>             except the strain should be converted to a factor. In
>             other words the design matrix should have just zeros and
>             ones, instead of 1-4.
>
>             Also note that you are not likely to get a lot of
>             differential expression with just two samples per group.
>             After you have reasonable depth per sample, sufficient
>             replication is needed in order to detect anything but the
>             largest differences.
>
>             Best,
>
>             Jim
>
>             On Dec 27, 2013 1:51 PM, "Reema Singh"
>             <reema28sep at gmail.com <mailto:reema28sep at gmail.com>> wrote:
>
>                 Hello Jim,
>
>                 Thanks very much. I tried this design matrix and it
>                 works great. The contrast(I am using) first finds the
>                 DEG based on strain difference and then between
>                 control and KO. The resulting DEG number is very low.
>                 Here's my contrast
>
>                 my.ctst <-
>                 makeContrasts(deg=(trt_srtControl_3-trt_srtControl_4)
>                 - (trt_srtKO_1 - trt_srtKO_2),levels=design)
>
>
>                 diff <- glmLRT(fit,design,contrast = my.ctst) OR diff
>                 <- glmLRT(fit,design,contrast = c(1,-1,-1,1))
>
>
>                 Is there any way to create contrast matrix to find out
>                 the DEG b/w Control and KO by adjusting the Starin
>                 difference instead of first finding DEG from different
>                 Strains?
>
>
>                 I have also tried different ways of creating design
>                 matrix and I am ended up totally confused. Here the
>                 another design matrix, I tried:-
>
>
>                 Straing <- target$Strain
>
>                 Condition <- targets$Condition
>
>                 Design <- model.matriix(~Condition+Straing)
>
>
>                 design1
>
>                 (Intercept) ConditionKO Straing
>
>                 KO1-A111
>
>                 KO1-B111
>
>                 KO2-A112
>
>                 KO2-B112
>
>                 Cont1-A103
>
>                 Cont1-B103
>
>                 Cont2-A104
>
>                 Cont2-B104
>
>                 attr(,"assign")
>
>                 [1] 0 1 2
>
>                 attr(,"contrasts")
>
>                 attr(,"contrasts")$Condition
>
>                 [1] "contr.treatment"
>
>
>                 diff <- glmLRT(fit,coef=2)
>
>
>                 Now this gives me more DEG. As I am using coef2, so is
>                 it finding the DEG b/w control vs ko after adjusting
>                 strain effect? Or is it simply finding the DEG the
>                 Control vs KO?
>
>
>                 Thank you
>
>                 Regards
>
>
>
>                 On Wed, Dec 25, 2013 at 5:59 PM, James W. MacDonald
>                 <jmacdon at uw.edu <mailto:jmacdon at uw.edu>> wrote:
>
>                     Hi Reema,
>
>                     Instead of using that parameterization, you might
>                     consider creating a treatment-strain coefficient,
>                     which will tend to be more easily interpreted. So
>                     if you read in the sample table and then do:
>
>                     trt_srt <- paste(targets$Condition,
>                     targets$strain, sep = "_")
>                     design <- model.matrix(~ 0 + trt_srt)
>
>                     Then you can do any contrast between any
>                     treatment/strain combination you like, including
>                     the interaction, which tests for any differences
>                     between treatment that depend on the strain.
>
>                     Best,
>
>                     Jim
>
>                     On Dec 23, 2013 5:56 PM, "Reema Singh"
>                     <reema28sep at gmail.com
>                     <mailto:reema28sep at gmail.com>> wrote:
>
>                         Dear All,
>
>                         I have some queries regarding design matrix
>                         for two group(Control vs. KO)
>                         Differential expression.I am using edgeR for
>                         this. Here is the ording of my
>                         question:- 1) Sample table for design matrix,
>                         2) Design matrix, 3)
>                         Questions.
>
>                         *1) Sample table for design matrix*
>
>
>                              Sample   Strain  Condition
>
>                              KO1-A   1        KO
>
>                              KO1-B   1        KO
>
>                              KO2-A   2        KO
>
>                              KO2-B   2        KO
>
>                              Cont1-A 1      Control
>
>                              Cont1-B  1      Control
>
>                              Cont2-A 2      Control
>
>                              Cont2-B  2      Control
>
>
>                         *2) Design Matrix*
>
>
>
>                         targets <- read.table(file=
>                         "Samples",sep="\t",header=TRUE)
>
>                         design <- model.matrix(~Strain+Condition,targets)
>
>                         > design
>
>                                   (Intercept) Strain ConditionKO
>
>                         KO1-A           1      1         1
>
>                         KO1-B           1      1         1
>
>                         KO2-A           1      2         1
>
>                         KO2-B           1      2         1
>
>                         Cont1-A           1      3           0
>
>                         Cont1-B           1      3           0
>
>                         Cont2-A           1      4           0
>
>                         Cont2-B           1      4           0
>
>                         attr(,"assign")
>
>                         [1] 0 1 2
>
>                         attr(,"contrasts")
>
>                         attr(,"contrasts")$Condition
>
>                         [1] "contr.treatment"
>
>
>                         *3) Questions*
>
>                         1)      *A)* During differential expression I
>                         also want to consider
>
>                         different strain variation along with the
>                         control vs KO variation.  As far
>                         as I understand this design matrix only
>                         consider one Control vs KO for DE.
>                         I would like to known How I can make it
>                         consider strain variation as well?
>
>                         2)      *B)* After DE using the same design
>                         matrix with glmFit, I inspect
>
>                         the read count for the top down regulated gene
>                         and find out that the read
>                         count for this gene is very low in knockout as
>                         compare to Control . So is
>                         it means the comparison is Control vs KO and
>                         topTags gives the DEG list in
>                         KO as compare to Control?
>                         I would appreciate your suggestion.
>
>                         Kind Regards
>
>
>                         --
>                         Reema Singh
>                         Postdoctoral Research Assistant
>                         College of Life Sciences
>                         University of Dundee,
>                         Dundee DD1 4HN, Scotland
>                         United Kingdom
>
>                                 [[alternative HTML version deleted]]
>
>                         _______________________________________________
>                         Bioconductor mailing list
>                         Bioconductor at r-project.org
>                         <mailto:Bioconductor at r-project.org>
>                         https://stat.ethz.ch/mailman/listinfo/bioconductor
>                         Search the archives:
>                         http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
>                 -- 
>                 Reema Singh
>                 Postdoctoral Research Assistant
>                 College of Life Sciences
>                 University of Dundee,
>                 Dundee DD1 4HN, Scotland
>                 United Kingdom
>
>
>
>
>         -- 
>         Reema Singh
>         Postdoctoral Research Assistant
>         College of Life Sciences
>         University of Dundee,
>         Dundee DD1 4HN, Scotland
>         United Kingdom
>
>
>
>
> -- 
> Reema Singh
> Postdoctoral Research Assistant
> College of Life Sciences
> University of Dundee,
> Dundee DD1 4HN, Scotland
> United Kingdom

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list