[BioC] EdgeR GLM

Wood, Shona Shona.Wood at liverpool.ac.uk
Tue Dec 13 10:34:56 CET 2011


Dear Gordon,
Thank you for getting back to me. I am not a statistician so forgive me if I am asking basic questions. 
I was just reading about quadratic equations. Does a significant hit in the quadratic term mean that the rate of change first increases and then decreases over time (or visa versa)? Therefore hits which require the quadratic term are not a linear up or down with age they are only "linear" after middle age?

If this is the case what does the logFC for age.Q represent, is it the fold change between middle age and old? And how is it calculated?
Does the gene expression at young age affect the quadratic calculation? For example if the expression at young age is 2 and then 4 at middle and then 1.99 at old would the fact that old age is basically back to the young expression level stop it being a significant quadratic result or would it be significant because of the change from middle age.

I have done the quadratic term separately as suggested but I also did the linear term separately (glmLRT(d, glmfit, coef=2)). When I compare the results of both these tests to glmLRT(d, glmfit, coef=2, 3) there are some significant genes in the glmLRT(d, glmfit, coef=2,3), that aren't in either of the separate quadratic or linear tests - why would this be?  

Finally if i want to report changes with increasing age would it be best to present the results from the quadratic and linear tests separately or would it be better to present the combined test (glmLRT(d, glmfit, coef=2, 3)) and point out which ones rely on the quadratic term?

Many thanks

Shona
________________________________________
From: Gordon K Smyth [smyth at wehi.EDU.AU]
Sent: 10 December 2011 07:10
To: Wood, Shona
Cc: Bioconductor mailing list
Subject: EdgeR GLM

Dear Shona,

Yes, age.L is the log2-fold-change for increasing age.

However your topTags table is testing for the linear and quadratic terms
together, so it is testing the significance of any differences between the
three age groups.  I'd suggest you test whether the quadratic term is
required using

    glmLRT(d, glmfit, coef=3)

I'd also generally recommend you go on to estimate the trended and tagwise
dispersions, not just the common dispersion.

Best wishes
Gordon

> Date: Thu, 8 Dec 2011 17:41:07 +0000
> From: "Wood, Shona" <Shona.Wood at liverpool.ac.uk>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] EdgeR GLM
>
> Hi,
>
> I am trying to look at gene expression changes with increasing age. I
> have three time points young, middle age and old. I have been using the
> following, ordered factors and coefficient in order to get a fold change
> and FDR for a linear change in gene expression with increasing age. Can
> you look over my code and see if I have done this right? Is age.L the
> fold change in gene expression with increasing age?
>
>> library(edgeR)
>> library(limma)
>> targets<-read.delim (file="coding_targets.txt")
>> targets$age<-factor(ordered (targets$age))
>> targets$sample<-factor(targets$sample)
>> targets
>   X                files      age sample
> 1 p1 p1_coding_counts.txt  a-young      1
> 2 p2 p2_coding_counts.txt  a-young      2
> 3 p3 p3_coding_counts.txt  a-young      3
> 4 p7 p7_coding_counts.txt b-middle      7
> 5 p8 p8_coding_counts.txt b-middle      8
> 6 p9 p9_coding_counts.txt b-middle      9
> 7 p4 p4_coding_counts.txt    c-old      4
> 8 p5 p5_coding_counts.txt    c-old      5
> 9 p6 p6_coding_counts.txt    c-old      6
>
>> d<-readDGE(targets, skip=1, comment.char='#')
>> colnames(d)<-row.names(targets)
>> d<- calcNormFactors(d)
>> d
> An object of class "DGEList"
> $samples
>   X                files      age sample lib.size norm.factors
> 1 p1 p1_coding_counts.txt  a-young      1  3445622    0.9655724
> 2 p2 p2_coding_counts.txt  a-young      2  2696547    0.9902573
> 3 p3 p3_coding_counts.txt  a-young      3  3308099    0.9787044
> 4 p7 p7_coding_counts.txt b-middle      7  2503479    1.0584660
> 5 p8 p8_coding_counts.txt b-middle      8  2639127    1.0477893
> 6 p9 p9_coding_counts.txt b-middle      9  2696547    0.9902573
> 7 p4 p4_coding_counts.txt    c-old      4  3037440    0.9778553
> 8 p5 p5_coding_counts.txt    c-old      5  2647915    1.0144109
> 9 p6 p6_coding_counts.txt    c-old      6  2475370    0.9809077
>
> $counts
>                      1    2    3   4   5    6    7    8   9
> ENSRNOG00000000007 1287 1285 1041 788 968 1285 1092 1009 960
> ENSRNOG00000000008    0    0    0   0   1    0    0    1   0
> ENSRNOG00000000009    0    0    0   3   0    0    1    0   0
> ENSRNOG00000000010   38  405   50  18 105  405  372   42 282
> ENSRNOG00000000012    0    0    0   0   0    0    0    0   0
> 22932 more rows ...
>
>> d<-d[rowSums(d$counts)>9,]
>> design<- model.matrix(~ age, data = targets)
>> design
>  (Intercept)         age.L      age.Q
> 1           1 -7.071068e-01  0.4082483
> 2           1 -7.071068e-01  0.4082483
> 3           1 -7.071068e-01  0.4082483
> 4           1 -7.850462e-17 -0.8164966
> 5           1 -7.850462e-17 -0.8164966
> 6           1 -7.850462e-17 -0.8164966
> 7           1  7.071068e-01  0.4082483
> 8           1  7.071068e-01  0.4082483
> 9           1  7.071068e-01  0.4082483
> attr(,"assign")
> [1] 0 1 1
> attr(,"contrasts")
> attr(,"contrasts")$age
> [1] "contr.poly"
>
>> d<-estimateGLMCommonDisp(d, design)
>> names(d)
> [1] "samples"           "counts"            "common.dispersion"
>> glmfit<- glmFit(d, design, dispersion=d$common.dispersion)
>> results<- glmLRT(d, glmfit, coef=c(2,3))
>> topTags(results)
> Coefficient:  age.L age.Q
>                     logConc      age.L      age.Q        LR      P.Value
> ENSRNOG00000011672 -11.27122 -3.2445886 -1.9642177 109.53641 1.638591e-24
>
> Many Thanks
>
> Shona

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list