[BioC] edgeR with two different groups (unpaired) at two time points (paired): Need some input

Wed Oct 23 09:32:49 CEST 2013

The problem is that the columns of your design matrix are linearly 
dependent: the statusDiabetic column can be written as a sum of all the 
columns corresponding to diabetic patients. In other words, the patient 
and status variables are confounded. I think your only option with 
edgeR is to drop the patients term from your model and just use "~ 
status * timepoints", but someone else might have another idea.

-Ryan

On Tue Oct 22 13:56:46 2013, Sindre Lee wrote:
> On 2013-10-22 20:03, Sindre Lee wrote:
>> Hi!
>> The edgeR manual is quite nice, but I'm not quite sure if I'm on the
>> right track..
>>
>> The question to answer is: "Is there any significantly changed genes
>> in diabetics at time point 2, compared to healthy?"
>>
>> Heres my codes:
>>
>> #Making the design matrix
>>
>> status <- factor(c(rep("Healthy",26), rep("Diabetic",22)),
>> levels=c("Healthy","Diabetic"))
>> patients <-
>> factor(c(88,87,86,83,82,81,75,72,71,70,13,08,01,88,87,86,83,82,81,75,72,71,70,13,08,01,79,77,76,73,67,62,61,55,53,21,04,79,77,76,73,67,62,61,55,53,21,04))
>>
>> timepoints =
>> as.factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2))
>>
>> design <- model.matrix(~patients + timepoints*status)
>>> colnames(design)
>>  [1] "(Intercept)"                "patients4"
>>  [3] "patients8"                  "patients13"
>>  [5] "patients21"                 "patients53"
>>  [7] "patients55"                 "patients61"
>>  [9] "patients62"                 "patients67"
>> [11] "patients70"                 "patients71"
>> [13] "patients72"                 "patients73"
>> [15] "patients75"                 "patients76"
>> [17] "patients77"                 "patients79"
>> [19] "patients81"                 "patients82"
>> [21] "patients83"                 "patients86"
>> [23] "patients87"                 "patients88"
>> [25] "timepoints2"                "statusDiabetic"
>> [27] "timepoints2:statusDiabetic"
>>
>> #Reading in HTSeq data
>>
>> description <- c("test");
>> fileNames <-
>> list.files(path="/Volumes/timemachine/HTseq_DEseq2",pattern="*.txt");
>> files <- sort(fileNames,decreasing=TRUE);
>> targets <- data.frame(files=files, group=status,
>> description=description);
>> library(edgeR);
>> d <-
>> readDGE(targets,path="/Volumes/timemachine/HTseq_DEseq2",skip=5,comment.char="!");
>>
>> colnames(d)
>> <-(c("N288","N287","N286","N283","N282","N281","N275","N272","N271","N270","N213","N208","N201","N188","187","N186","N183","N182","N181","N175","N172","N171","N170","N113","N108","N101","D279","D277","D276","D273","D267","D262","D261","D255","D253","D221","D204","D179","D177","D176","D173","D167","D162","D161","D155","D153","D121","D104"));
>>
>> d$samples;
>> dim(d)
>>
>> #Normalization
>>
>> d <- (calcNormFactors(d,method="RLE"));
>>
>> #Estimating the dispersion:
>>
>> d <- estimateCommonDisp(d,verbose=TRUE)
>> Disp = 0.05823 , BCV = 0.2413
>> d <- estimateTagwiseDisp(d,trend="none")
>>
>> #diffeksp
>>
>>> fit <- glmFit(d,design)
>> Error in glmFit.default(y = y$counts, design = design, dispersion =
>> dispersion,  :
>>   Design matrix not of full rank.  The following coefficients not
>> estimable:
>>  statusDiabetic
>>
>> Can someone tell me why I get this error?
>>
>> Thanks alot!
>
> MARK!
> In group 1: 13 samples
> In group 2: 11 samples
>
> I guess this is the problem? How to handle this?
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor