[BioC] ReportingTools and gene annotation

James W. MacDonald jmacdon at uw.edu
Mon Feb 10 16:28:08 CET 2014


Hi Ugo,

On 2/10/2014 6:02 AM, Ugo Borello wrote:
> Good morning,
> I am using ReportingTools with DESeq2 and I am not able to add the gene
> annotation to my final report.
> I have ensembl gene id as identifiers and not entrez id!
>
> I followed Jason's suggestions as described here:
>
> http://article.gmane.org/gmane.science.biology.informatics.conductor/51995/m
> atch=
>
> But the add.anns() functions doesn't work in my hands.
>
>> mart <- useMart("ensembl",dataset="mmusculus_gene_ensembl")
>> add.anns <- function(df, mart, ...)
> + {
> +   nm <- rownames(df)
> +   anns <- getBM(
> +     attributes = c("ensembl_gene_id", "external_gene_id", "description"),
> +     filters = "ensembl_gene_id", values = nm, mart = mart)
> +   anns <- anns[match(nm, anns[, 1]), ]
> +   colnames(anns) <- c("ID", "Gene Symbol", "Gene Description")
> +   df <- cbind(anns, df[, 2:nrow(df)])

Note that in the line above you are subsetting 'df' by column, using the 
number of rows. I am not sure if you want to eliminate the first column 
here (as you are using the rownames to annotate, so I don't know what 
the first column contains). But it is simpler to eliminate the first 
column than to keep the 2:ncol(columns):

df <- cbind(anns, df[,-1])

Best,

Jim
> +   rownames(df) <- nm
> +   df
> + }
>> publish(dds, des2Report, factor= colData(dds)$condition, .modifyDF =
> list(add.anns, modifyReportDF), mart = mart)    ## dds is the DESeqDataSet
> object
>
>   Show Traceback
>   Rerun with Debug
>   Error in `[.data.frame`(df, , 2:nrow(df)) : undefined columns selected
>
> I also tried this:
>> publish(dds, des2Report, factor= colData(dds)$condition, .modifyDF =
> list(add.anns, modifyReportDF), mart = mart, df= counts(dds))  # dds is the
> DESeqDataSet object
>
>   Show Traceback
>   Rerun with Debug
>   Error in df[, 2:nrow(df)] : subscript out of bounds
>
>
> What am I doing wrong?
>
> Is there a simple way of adding my annotation to the HTML report?
>
>   ENSEMBL                          ENTREZID SYMBOL     GENENAME
> 1 ENSMUSG00000000001    14679   Gnai     guanine nucleotide binding protein
> (G protein), alpha inhibiting 3
> 2 ENSMUSG00000000028    12544  Cdc45   cell division cycle 45
> 3 ENSMUSG00000000031       NA     NA      NA
> 4 ENSMUSG00000000037   107815  Scml2  sex comb on midleg-like 2 (Drosophila)
> 5 ENSMUSG00000000049    11818   Apoh    apolipoprotein H
> 6 ENSMUSG00000000056    67608   Narf     nuclear prelamin A recognition
> factor
>
>
> Thank you
>
> Ugo
>
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
>   [1] ReportingTools_2.2.0    knitr_1.5               org.Mm.eg.db_2.10.1
> RSQLite_0.11.4          DBI_0.2-7
>   [6] AnnotationDbi_1.24.0    Biobase_2.22.0          DESeq2_1.2.9
> RcppArmadillo_0.4.000.2 Rcpp_0.10.6
> [11] GenomicRanges_1.14.4    XVector_0.2.0           IRanges_1.20.6
> BiocGenerics_0.8.0      biomaRt_2.18.0
>
> loaded via a namespace (and not attached):
>   [1] annotate_1.40.0          AnnotationForge_1.4.4    Biostrings_2.30.1
> biovizBase_1.10.7        bitops_1.0-6
>   [6] BSgenome_1.30.0          Category_2.28.0          cluster_1.14.4
> colorspace_1.2-4         dichromat_2.0-0
> [11] digest_0.6.4             edgeR_3.4.2              evaluate_0.5.1
> formatR_0.10             Formula_1.1-1
> [16] genefilter_1.44.0        GenomicFeatures_1.14.2   ggbio_1.10.10
> ggplot2_0.9.3.1          GO.db_2.10.1
> [21] GOstats_2.28.0           graph_1.40.1             grid_3.0.2
> gridExtra_0.9.1          GSEABase_1.24.0
> [26] gtable_0.1.2             Hmisc_3.14-0             hwriter_1.3
> labeling_0.2             lattice_0.20-24
> [31] latticeExtra_0.6-26      limma_3.18.10            locfit_1.5-9.1
> MASS_7.3-29              Matrix_1.1-2
> [36] munsell_0.4.2            PFAM.db_2.10.1           plyr_1.8
> proto_0.3-10             R.methodsS3_1.6.1
> [41] R.oo_1.17.0              R.utils_1.28.4           RBGL_1.38.0
> RColorBrewer_1.0-5       RCurl_1.95-4.1
> [46] reshape2_1.2.2           Rsamtools_1.14.2         rtracklayer_1.22.3
> scales_0.2.3             splines_3.0.2
> [51] stats4_3.0.2             stringr_0.6.2            survival_2.37-7
> tools_3.0.2              VariantAnnotation_1.8.10
> [56] XML_3.95-0.2             xtable_1.7-1             zlibbioc_1.8.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list