[BioC] edgeR GLM using factor that varies for each gene

Thu May 8 09:33:05 CEST 2014

Hi,

after going over the user guide and searching this mailing list I'm not quite clear on how to best address my specific situation:

I'd like to test differential "expression" of specific splicing events between a mutant and the wild type in a replicated design. To do so, I've specifically counted reads that are specific to a certain splicing event for each gene. 

e.g.
event	AS.type	mutant.line1.rep1	mutant.line1.rep2	mutant.line2.rep1	mutant.line2.rep2	wt.rep1	wt.rep2
S102-F_10.883	alt_donor	4	7	4	7	0	1
S102-F_12.884	alt_donor	0	1	0	1	0	2
S102-F_10.887	alt_donor	0	0	0	0	30	33
S102-F_10.886	alt_acceptor	0	0	0	0	22	21
S102-F_11.890	alt_donor	0	0	0	0	0	0
S102-F_11.889	alt_acceptor	0	0	0	0	0	0
S102-F_10.891	alt_acceptor	0	0	0	0	0	0
S103-R_3.901	alt_acceptor	4	5	4	5	10	11
S103-R_2.904	skipped_exon	2	4	2	4	33	28
S103-R_2.902	alt_acceptor	4	5	4	5	0	0
S103-R_1.906	alt_acceptor	0	1	0	1	1	0

It's not clear from this example, but overall there is a difference between abundances and noise levels of specific types of alternative splicing I'd like to correct for, but also assess using GLM. Thus, ideally I'd like to find differentially abundant splicing events between the mutant and the wild type irrespective of line and biological replicate. 

As far as I understood the UserGuide and the ReferenceManual design always refers to factors for describing the libraries/experiments the counts are derived from. 

If I'd be using "normal" GLM, what I want to do would look like glm(count ~ AS.type + genotype + line + biological.replicate).

Can I accomplish this with edgeR without splitting up the events into different data sets per splice type?

Any advise on this would be greatly appreciated.

Best,
Daniel 

 -- output of sessionInfo(): 

R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=de_DE.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=de_DE.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=C                LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

--
Sent via the guest posting facility at bioconductor.org.