[BioC] Paired comparisons in EdgeR

Mon May 12 13:33:07 CEST 2014

Thank you for the advice! There is a further problem, though.

I got the error

"Error in glmFit.default(y, design = design, dispersion = dispersion, 
offset = offset,  :
   Design matrix not of full rank.  The following coefficients not 
estimable:
  treatment2 treatment3"

My code is now:
x<-read.delim("..
group<-factor(c(21,19,6,8,12,18,20,11,9,...
treatment<-factor(c(3,3,3,1,1,2,3,1,1,...
y<-DGEList(counts=x,group=group)
design<-model.matrix(~group+treatment, data=y$samples)
y<-estimateGLMTrendedDisp(y,design)
y<-estimateGLMTagwiseDisp(y,design)
fit<-glmFit(y,design)

lrt <- glmLRT(fit)
tt<-topTags(lrt)
write.table(tt,file="tt")

I have been looking at this all day, but it seems I am too stupid to 
understand this. I have 72 samples in 24 groups (3 replicates in each 
group). There are 3 treatments, and factor "treatment"'s length is 72. 
For each sample I have given the group in factor "group". There are 
three treatments: no treatment (1), treatment A (2) and treatment B (3).

In the design matrix there is an intercept column, then the 24 groups 
and finally two treatment columns.

Jussi

PhD, Computer Science
University of Turku, Finland

08.05.2014 17:23, James W. MacDonald kirjoitti:
> Hi Jussi,
>
> On 5/8/2014 7:24 AM, Jussi Salmi wrote:
>> Hello!
>>
>>
>> Thank you for the nice software and clear user guide. I am using EdgeR to analyse RNA-seq data. In the experiment there are several cell cultures with different knock-downs. The cell cultures are stimulated in two different ways. In the end, I want to compare the treated cultures to the untreated ones. The same untreated knock-down is compared to the same treated knock-down culture. I think that the study design produces paired comparisons because the same cultures are first used as the untreated ones and then they are treated and compared against the untreated.
>>
>> I have come up with the following code, based on the user guide:
>>
>> x<-read.delim("htseqout.edger", sep=" ",row.names="Symbol")
>> group<-factor(c(21,21,19,19,6,6,8,8,12,12,18,18,20,20,11,11,9,9,23...
>> y<-DGEList(counts=x,group=group)
>> design<-model.matrix(~0+group, data=y$samples)
>
> You want to include a factor for treatment as well. Something like
>
> trt <- factor(rep(1:2, length(group)/2))
>
> design <- model.matrix(~group+trt)
> fit <- glmFit(y, design)
> lrt <- glmLRT(fit)
> topTags(lrt)
>
> Will give you the genes that change between treatments, after
> controlling for the paired nature of your experiment.
>
> Best,
>
> Jim
>
>
>> colnames(design)<-levels(y$samples$group)
>> y<-estimateGLMTrendedDisp(y,design)
>> y<-estimateGLMTagwiseDisp(y,design)
>> fit<-glmFit(y,design)
>> ?lrt0203<-glmLRT(fit,contrast=c(0,1,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
>> tt0203<-topTags(lrt0203, n=13000)
>> write.table(tt0203,file="tt0203")
>>
>> [several other comparisons)
>>
>> I tried to understand whether this is a good way to analyse paired samples or is there a better way?
>>
>> Thanks,
>>
>> Jussi Salmi
>> PhD computer Science
>> Centre for Biotechnology, Turku, Finland
>>
>>> sessionInfo()
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>    [1] LC_CTYPE=fi_FI.UTF-8       LC_NUMERIC=C               LC_TIME=fi_FI.UTF-8
>>    [4] LC_COLLATE=fi_FI.UTF-8     LC_MONETARY=fi_FI.UTF-8    LC_MESSAGES=fi_FI.UTF-8
>>    [7] LC_PAPER=fi_FI.UTF-8       LC_NAME=C                  LC_ADDRESS=C
>> [10] LC_TELEPHONE=C             LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] splines   stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] edgeR_3.4.2   limma_3.18.13
>>
>> loaded via a namespace (and not attached):
>> [1] tools_3.1.0
>>
>

-- 
Jussi Salmi, PhD
http://www.btk.fi/index.php?id=12&sort=&pid=282