[BioC] use of voom function with attract package

Tue Apr 15 18:44:42 CEST 2014

Dear Bioconductor,

I am trying to use the attract package to find the processes that are differentially activated between cell types of the same lineage, using RNA-Seq data. Since the attract package is designed to work with microarray data, I decided to use the voom function to transform my data and change the findAttractors() function accordingly, to accommodate this type of data. Since this is not trivial, I want to make sure that I am using the output from the voom function correctly. 

The main part of the findAttractors() uses lm to model the expression in relation to the cell type (group) and then an anova to get the F statistic for each gene:
    fstat <- apply(dat.detect.wkegg, 1, function(y, x) {
        anova(lm(y ~ x))[[4]][1]}, x = group)
where dat.detect.wkegg is the matrix of the normalized expression values with the genes per row and the samples per column.
(To give some more context, the function then uses the log2 values of the fstat and does a t test between the gene values of a specific pathway vs all the gene values to identify the significant pathways. ) 

What I want to do is change the above to:

counts_data <- DGEList(counts=rnaseq,group=celltype)
counts_data_norm <- calcNormFactors(counts_data) 

design <- model.matrix( ~ celltype)
anal_voom <- voom(counts_data_norm, design)
dat.detect.wkegg <- as.list(as.data.frame(t(anal_voom$E)))
voom_weights <- as.list(as.data.frame(t(anal_voom$weights)))

fstat <- mapply(function(y, w, group) {anova(lm(y ~ group, weights=w))[[4]][1]},
	dat.detect.wkegg, voom_weights, MoreArgs = list(group=celltype))

Is this the way to go with using the weights from voom, or am I getting this very wrong?

Many thanks in advance for your reply!

Best wishes,
Emmanouela

 -- output of sessionInfo(): 

sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.ISO-8859-1       LC_NUMERIC=C                    LC_TIME=en_GB.ISO-8859-1        LC_COLLATE=en_GB.ISO-8859-1     LC_MONETARY=en_GB.ISO-8859-1    LC_MESSAGES=en_GB.ISO-8859-1   
 [7] LC_PAPER=C                      LC_NAME=C                       LC_ADDRESS=C                    LC_TELEPHONE=C                  LC_MEASUREMENT=en_GB.ISO-8859-1 LC_IDENTIFICATION=C            

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] attract_1.14.0       GOstats_2.28.0       graph_1.40.1         Category_2.28.0      GO.db_2.10.1         Matrix_1.1-3         cluster_1.15.2       annotate_1.40.1      org.Mm.eg.db_2.10.1 
[10] KEGG.db_2.10.1       RSQLite_0.11.4       DBI_0.2-7            AnnotationDbi_1.24.0 Biobase_2.22.0       BiocGenerics_0.8.0   edgeR_3.4.2          limma_3.18.13       

loaded via a namespace (and not attached):
 [1] AnnotationForge_1.4.4 genefilter_1.44.0     grid_3.0.1            GSEABase_1.24.0       IRanges_1.20.7        lattice_0.20-29       RBGL_1.38.0           splines_3.0.1        
 [9] stats4_3.0.1          survival_2.37-7       tcltk_3.0.1           tools_3.0.1           XML_3.98-1.1          xtable_1.7-3         

--
Sent via the guest posting facility at bioconductor.org.