[BioC] edgeR GLM using factor that varies for each gene

Fri May 9 11:37:55 CEST 2014

Dear Gordon,

thank you so much for your prompt and helpful answer.

You're right I was thinking too complicated:-)

Best,
Daniel

On 09.05.2014 06:12, Gordon K Smyth wrote:
> Dear Daniel,
> 
> I don't see any need for a gene-specific factor.
> 
> Simply give all the count rows (for all genes and all splicing events)
> to edgeR.  The design matrix is:
> 
>   genotype <- factor(c("mutant1","mutant1","mutant2","mutant2","wt","wt"))
>   genotype <- relevel(genotype,ref="wt")
>   design <- model.matrix(~genotype)
> 
> If you want to find differentially abundant events between the mutants
> and wt, you can run glmLRT() with coef=2 to examine mutant1, coef=3 to
> examine mutant2, and contrast=c(0,0.5,0.5) to average the two mutant lines.
> 
> Best wishes
> Gordon
> 
> 
>> Date: Thu,  8 May 2014 00:33:05 -0700 (PDT)
>> From: "Daniel Lang [guest]" <guest at bioconductor.org>
>> To: bioconductor at r-project.org, daniel.lang at biologie.uni-freiburg.de
>> Subject: [BioC] edgeR GLM using factor that varies for each gene
>>
>> Hi,
>>
>> after going over the user guide and searching this mailing list I'm
>> not quite clear on how to best address my specific situation:
>>
>> I'd like to test differential "expression" of specific splicing events
>> between a mutant and the wild type in a replicated design. To do so,
>> I've specifically counted reads that are specific to a certain
>> splicing event for each gene.
>>
>> e.g.
>> event    AS.type    mutant.line1.rep1    mutant.line1.rep2   
>> mutant.line2.rep1    mutant.line2.rep2    wt.rep1    wt.rep2
>> S102-F_10.883    alt_donor    4    7    4    7    0    1
>> S102-F_12.884    alt_donor    0    1    0    1    0    2
>> S102-F_10.887    alt_donor    0    0    0    0    30    33
>> S102-F_10.886    alt_acceptor    0    0    0    0    22    21
>> S102-F_11.890    alt_donor    0    0    0    0    0    0
>> S102-F_11.889    alt_acceptor    0    0    0    0    0    0
>> S102-F_10.891    alt_acceptor    0    0    0    0    0    0
>> S103-R_3.901    alt_acceptor    4    5    4    5    10    11
>> S103-R_2.904    skipped_exon    2    4    2    4    33    28
>> S103-R_2.902    alt_acceptor    4    5    4    5    0    0
>> S103-R_1.906    alt_acceptor    0    1    0    1    1    0
>>
>> It's not clear from this example, but overall there is a difference
>> between abundances and noise levels of specific types of alternative
>> splicing I'd like to correct for, but also assess using GLM. Thus,
>> ideally I'd like to find differentially abundant splicing events
>> between the mutant and the wild type irrespective of line and
>> biological replicate.
>>
>> As far as I understood the UserGuide and the ReferenceManual design
>> always refers to factors for describing the libraries/experiments the
>> counts are derived from.
>>
>> If I'd be using "normal" GLM, what I want to do would look like
>> glm(count ~ AS.type + genotype + line + biological.replicate).
>>
>> Can I accomplish this with edgeR without splitting up the events into
>> different data sets per splice type?
>>
>> Any advise on this would be greatly appreciated.
>>
>> Best,
>> Daniel
>>
>> -- output of sessionInfo():
>>
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>> [3] LC_TIME=de_DE.utf8        LC_COLLATE=en_US.utf8
>> [5] LC_MONETARY=de_DE.utf8    LC_MESSAGES=en_US.utf8
>> [7] LC_PAPER=C                LC_NAME=C
>> [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:23}}