[BioC] Paired comparisons in EdgeR

Thu May 8 16:23:28 CEST 2014

Hi Jussi,

On 5/8/2014 7:24 AM, Jussi Salmi wrote:
> Hello!
>
>
> Thank you for the nice software and clear user guide. I am using EdgeR to analyse RNA-seq data. In the experiment there are several cell cultures with different knock-downs. The cell cultures are stimulated in two different ways. In the end, I want to compare the treated cultures to the untreated ones. The same untreated knock-down is compared to the same treated knock-down culture. I think that the study design produces paired comparisons because the same cultures are first used as the untreated ones and then they are treated and compared against the untreated.
>
> I have come up with the following code, based on the user guide:
>
> x<-read.delim("htseqout.edger", sep=" ",row.names="Symbol")
> group<-factor(c(21,21,19,19,6,6,8,8,12,12,18,18,20,20,11,11,9,9,23...
> y<-DGEList(counts=x,group=group)
> design<-model.matrix(~0+group, data=y$samples)

You want to include a factor for treatment as well. Something like

trt <- factor(rep(1:2, length(group)/2))

design <- model.matrix(~group+trt)
fit <- glmFit(y, design)
lrt <- glmLRT(fit)
topTags(lrt)

Will give you the genes that change between treatments, after 
controlling for the paired nature of your experiment.

Best,

Jim

> colnames(design)<-levels(y$samples$group)
> y<-estimateGLMTrendedDisp(y,design)
> y<-estimateGLMTagwiseDisp(y,design)
> fit<-glmFit(y,design)
> ?lrt0203<-glmLRT(fit,contrast=c(0,1,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
> tt0203<-topTags(lrt0203, n=13000)
> write.table(tt0203,file="tt0203")
>
> [several other comparisons)
>
> I tried to understand whether this is a good way to analyse paired samples or is there a better way?
>
> Thanks,
>
> Jussi Salmi
> PhD computer Science
> Centre for Biotechnology, Turku, Finland
>
>> sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=fi_FI.UTF-8       LC_NUMERIC=C               LC_TIME=fi_FI.UTF-8
>   [4] LC_COLLATE=fi_FI.UTF-8     LC_MONETARY=fi_FI.UTF-8    LC_MESSAGES=fi_FI.UTF-8
>   [7] LC_PAPER=fi_FI.UTF-8       LC_NAME=C                  LC_ADDRESS=C
> [10] LC_TELEPHONE=C             LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] edgeR_3.4.2   limma_3.18.13
>
> loaded via a namespace (and not attached):
> [1] tools_3.1.0
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099