[BioC] EdgeR: paired samples together with independant samples

Gordon K Smyth smyth at wehi.EDU.AU
Wed Nov 7 23:55:12 CET 2012


Dear Maria,

Sounds ok from what you say not to collapse libraries.  However, if the 
three treated cultures and three untreated cultures for one patient are 
truly three pairs, then this pairing should be reflected in the analysis. 
You can handle this by numbering the samples by paired culture from 1 to 7 
instead of numbering by patient.

An MDS plot could guide you in judging whether there are baseline 
differences between the different pairs for one patient, and hence whether 
your pairing should be by culture instead of by patient.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth

On Wed, 7 Nov 2012, Maria Keays wrote:

> Dear Gordon,
>
> Thanks very much for the helpful advice. I'm treating them as biological 
> replicates -- they are cell cultures and it's just that I have multiple 
> separately treated/untreated pairs of cultures from some patients and only 
> one treated/untreated pair for others. So although some cultures came from 
> the same patient, they were all treated separately and then RNA was extracted 
> from each culture. Would you say that's the right thing to do?
>
> Thanks and best wishes,
> Maria
>
>
> On 07/11/2012 00:01, Gordon K Smyth wrote:
>> Dear Maria,
>> 
>> Thanks for the specific reference to the documentation that you've 
>> followed.
>> 
>> Yes, you are correct, the error is arising because there is no 4th patient 
>> in the healthy group.  If you have a look at your design matrix, you will 
>> see that there is a column called DiseaseHealthy:Patient4 that consists 
>> entirely of zeros.  It should be column 8, but check:
>>
>>    design[,8]
>> 
>> The easiest way to proceed is simply to remove that column manually from 
>> the design matrix:
>>
>>    design2 <- design[,-8]
>> 
>> Your experiment has another issue, in that you have repeat samples on 
>> several of the patients.  Are these biological replicates?  If not, if they 
>> are just technical replicates, then they should be collapsed into one 
>> library before analysis.
>> 
>> Best wishes
>> Gordon
>> 
>>> Date: Tue, 06 Nov 2012 09:19:08 +0000
>>> From: Maria Keays <mkeays at ebi.ac.uk>
>>> To: bioconductor at r-project.org
>>> Subject: Re: [BioC] EdgeR: paired samples together with independant
>>>     samples
>>> 
>>> Hello,
>>> 
>>> I read this thread and related user guide material with interest because
>>> I am working with a very similar data set with paired samples. However,
>>> I'm having trouble which I think stems from my data being unbalanced? I
>>> have four patients with a disease and three without, and within that for
>>> some patients I have replicates but for others I do not. I've created a
>>> design matrix as described on p32 of the 27 October 2012 edgeR user's
>>> guide, but when I try to estimate the common dispersion using
>>> estimateGLMCommonDisp() it tells me:
>>> 
>>> "Error in glmFit.default(y, design = design, dispersion = dispersion,
>>> offset = offset) :
>>>   Design matrix not of full rank.  The following coefficients not
>>> estimable:
>>>  DiseaseHealthy:Patient4"
>>> 
>>> I guess because I have 4 patients in the diseased set and only 3 in the
>>> healthy set? If I remove Patient4 and try again, I'm able to continue
>>> the analysis successfully, but I'd obviously like to be able to include
>>> all the data -- is that possible? If so, could you explain how to do it?
>>> 
>>> The original annotations for my data are below:
>>> 
>>> Disease    Patient    Treatment
>>> disease1    1    control
>>> disease1    1    control
>>> disease1    1    control
>>> disease1    2    control
>>> disease1    3    control
>>> disease1    3    control
>>> disease1    4    control
>>> disease1    1    treat
>>> disease1    1    treat
>>> disease1    1    treat
>>> disease1    2    treat
>>> disease1    3    treat
>>> disease1    3    treat
>>> disease1    4    treat
>>> healthy    5    control
>>> healthy    6    control
>>> healthy    6    control
>>> healthy    6    control
>>> healthy    7    control
>>> healthy    7    control
>>> healthy    5    treat
>>> healthy    6    treat
>>> healthy    6    treat
>>> healthy    6    treat
>>> healthy    7    treat
>>> healthy    7    treat
>>> 
>>> As I was following the user's guide I amended the "Patient" labels so it
>>> looked like this when I created the design matrix:
>>> 
>>> Disease    Patient    Treatment
>>> disease1    1    control
>>> disease1    1    control
>>> disease1    1    control
>>> disease1    2    control
>>> disease1    3    control
>>> disease1    3    control
>>> disease1    4    control
>>> disease1    1    treat
>>> disease1    1    treat
>>> disease1    1    treat
>>> disease1    2    treat
>>> disease1    3    treat
>>> disease1    3    treat
>>> disease1    4    treat
>>> healthy    1    control
>>> healthy    2    control
>>> healthy    2    control
>>> healthy    2    control
>>> healthy    3    control
>>> healthy    3    control
>>> healthy    1    treat
>>> healthy    2    treat
>>> healthy    2    treat
>>> healthy    2    treat
>>> healthy    3    treat
>>> healthy    3    treat
>>> 
>>> Thanks!
>>> Maria
>>> 
>>> 
>>> On 25/10/2012 06:18, Gordon K Smyth wrote:
>>>> Dear Anna,
>>>> 
>>>> You are right to recognise that the analysis of this sort of design is
>>>> more complex than many other experiments, because it includes
>>>> comparisons both within and between patients.  I have included a new
>>>> section in the edgeR User's Guide based on your experiment that
>>>> describes the analysis. This will appear in the official release of
>>>> edgeR in a couple of days. In the meantime, see pages 31-33 of:
>>>>
>>>>   http://bioinf.wehi.edu.au/software/edgeR/edgeRUsersGuide.pdf
>>>> 
>>>> Best wishes
>>>> Gordon
>>>> 
>>>>> Date: Tue, 23 Oct 2012 06:37:44 -0700 (PDT)
>>>>> From: "anna [guest]" <guest at bioconductor.org>
>>>>> To: bioconductor at r-project.org, m.nadira at yahoo.fr
>>>>> Subject: [BioC] EdgeR: paired samples together with independant
>>>>>     samples
>>>>> 
>>>>> 
>>>>> Hello,
>>>>> I am using EdgeR to analyse my RNAseq data.
>>>>> 
>>>>> I have:
>>>>> 
>>>>> cells from 3 healthy patients , either treated or not with a hormone .
>>>>> 
>>>>> cells from 3 patients with disease D1, either treated or not with the
>>>>> hormone
>>>>> 
>>>>> cells from 3 patients with disease D2, either treated or not with the
>>>>> hormone.
>>>>> 
>>>>> I would like to know what is wrong in the response to the hormone in
>>>>> patients with disease D1 and D2.
>>>>> 
>>>>> I don't know how to combine paired comparisons, with pairwise
>>>>> comparisons, in a unique glm analysis.
>>>>> 
>>>>> thank you very much,
>>>>> anna
>>>>> 
>>>>> -- output of sessionInfo():
>>>>> 
>>>>> R version 2.15.1 (2012-06-22)
>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>> 
>>>>> locale:
>>>>> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
>>>>> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
>>>>> [5] LC_TIME=French_France.1252
>>>>> 
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets methods base
>>>>> 
>>>>> loaded via a namespace (and not attached):
>>>>> [1] tools_2.15.1
>>>>>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list