[BioC] EdgeR: paired samples together with independant samples

Wed Nov 7 16:57:39 CET 2012

Dear Gordon,

Thanks very much for the helpful advice. I'm treating them as biological 
replicates -- they are cell cultures and it's just that I have multiple 
separately treated/untreated pairs of cultures from some patients and 
only one treated/untreated pair for others. So although some cultures 
came from the same patient, they were all treated separately and then 
RNA was extracted from each culture. Would you say that's the right 
thing to do?

Thanks and best wishes,
Maria

On 07/11/2012 00:01, Gordon K Smyth wrote:
> Dear Maria,
>
> Thanks for the specific reference to the documentation that you've 
> followed.
>
> Yes, you are correct, the error is arising because there is no 4th 
> patient in the healthy group.  If you have a look at your design 
> matrix, you will see that there is a column called 
> DiseaseHealthy:Patient4 that consists entirely of zeros.  It should be 
> column 8, but check:
>
>    design[,8]
>
> The easiest way to proceed is simply to remove that column manually 
> from the design matrix:
>
>    design2 <- design[,-8]
>
> Your experiment has another issue, in that you have repeat samples on 
> several of the patients.  Are these biological replicates?  If not, if 
> they are just technical replicates, then they should be collapsed into 
> one library before analysis.
>
> Best wishes
> Gordon
>
>> Date: Tue, 06 Nov 2012 09:19:08 +0000
>> From: Maria Keays <mkeays at ebi.ac.uk>
>> To: bioconductor at r-project.org
>> Subject: Re: [BioC] EdgeR: paired samples together with independant
>>     samples
>>
>> Hello,
>>
>> I read this thread and related user guide material with interest because
>> I am working with a very similar data set with paired samples. However,
>> I'm having trouble which I think stems from my data being unbalanced? I
>> have four patients with a disease and three without, and within that for
>> some patients I have replicates but for others I do not. I've created a
>> design matrix as described on p32 of the 27 October 2012 edgeR user's
>> guide, but when I try to estimate the common dispersion using
>> estimateGLMCommonDisp() it tells me:
>>
>> "Error in glmFit.default(y, design = design, dispersion = dispersion,
>> offset = offset) :
>>   Design matrix not of full rank.  The following coefficients not
>> estimable:
>>  DiseaseHealthy:Patient4"
>>
>> I guess because I have 4 patients in the diseased set and only 3 in the
>> healthy set? If I remove Patient4 and try again, I'm able to continue
>> the analysis successfully, but I'd obviously like to be able to include
>> all the data -- is that possible? If so, could you explain how to do it?
>>
>> The original annotations for my data are below:
>>
>> Disease    Patient    Treatment
>> disease1    1    control
>> disease1    1    control
>> disease1    1    control
>> disease1    2    control
>> disease1    3    control
>> disease1    3    control
>> disease1    4    control
>> disease1    1    treat
>> disease1    1    treat
>> disease1    1    treat
>> disease1    2    treat
>> disease1    3    treat
>> disease1    3    treat
>> disease1    4    treat
>> healthy    5    control
>> healthy    6    control
>> healthy    6    control
>> healthy    6    control
>> healthy    7    control
>> healthy    7    control
>> healthy    5    treat
>> healthy    6    treat
>> healthy    6    treat
>> healthy    6    treat
>> healthy    7    treat
>> healthy    7    treat
>>
>> As I was following the user's guide I amended the "Patient" labels so it
>> looked like this when I created the design matrix:
>>
>> Disease    Patient    Treatment
>> disease1    1    control
>> disease1    1    control
>> disease1    1    control
>> disease1    2    control
>> disease1    3    control
>> disease1    3    control
>> disease1    4    control
>> disease1    1    treat
>> disease1    1    treat
>> disease1    1    treat
>> disease1    2    treat
>> disease1    3    treat
>> disease1    3    treat
>> disease1    4    treat
>> healthy    1    control
>> healthy    2    control
>> healthy    2    control
>> healthy    2    control
>> healthy    3    control
>> healthy    3    control
>> healthy    1    treat
>> healthy    2    treat
>> healthy    2    treat
>> healthy    2    treat
>> healthy    3    treat
>> healthy    3    treat
>>
>> Thanks!
>> Maria
>>
>>
>> On 25/10/2012 06:18, Gordon K Smyth wrote:
>>> Dear Anna,
>>>
>>> You are right to recognise that the analysis of this sort of design is
>>> more complex than many other experiments, because it includes
>>> comparisons both within and between patients.  I have included a new
>>> section in the edgeR User's Guide based on your experiment that
>>> describes the analysis. This will appear in the official release of
>>> edgeR in a couple of days. In the meantime, see pages 31-33 of:
>>>
>>>   http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf
>>>
>>> Best wishes
>>> Gordon
>>>
>>>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT)
>>>> From: "anna [guest]" <guest at bioconductor.org>
>>>> To: bioconductor at r-project.org, m.nadira at yahoo.fr
>>>> Subject: [BioC] EdgeR: paired samples together with independant
>>>>     samples
>>>>
>>>>
>>>> Hello,
>>>> I am using EdgeR to analyse my RNAseq data.
>>>>
>>>> I have:
>>>>
>>>> cells from 3 healthy patients , either treated or not with a hormone .
>>>>
>>>> cells from 3 patients with disease D1, either treated or not with the
>>>> hormone
>>>>
>>>> cells from 3 patients with disease D2, either treated or not with the
>>>> hormone.
>>>>
>>>> I would like to know what is wrong in the response to the hormone in
>>>> patients with disease D1 and D2.
>>>>
>>>> I don't know how to combine paired comparisons, with pairwise
>>>> comparisons, in a unique glm analysis.
>>>>
>>>> thank you very much,
>>>> anna
>>>>
>>>> -- output of sessionInfo():
>>>>
>>>> R version 2.15.1 (2012-06-22)
>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
>>>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=French_France.1252
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets methods base
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] tools_2.15.1
>>>>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}