[BioC] ReportingTools gene IDs

James W. MacDonald jmacdon at uw.edu
Thu Apr 24 15:45:03 CEST 2014


Hi Assa,

There may well be a way to work with Ensembl IDs, and you will likely 
get an answer to your direct question from one of the maintainers.

However you should note that ReportingTools simply takes the input 
object and then coerces the data to a data.frame, which is then used to 
create the HTML table. You can always create the data.frame to your own 
liking up front, and then pass that to publish(). While this is more 
work than just passing in the DESeqDataSet, you do have complete control 
over the output.

Best,

Jim


On 4/24/2014 8:50 AM, Assa Yeroslaviz wrote:
> Hi,
>
> Is it neccessary to have entrez gene IDs to work with this package?
>
> I am working on a dataset with Ensembl IDs. Do I need to convert them to
> Entrez?
>
> When trying to create a report for a DESeqDataSet or DESeqResults objects i
> am getting the error messege:
>
> Error: Ids do not appear to be Entrez Ids for the specified species.
>
> Is there a way to work straight with the ensembl IDs?
>
> Thanks
>
> Assa
>
> my script:
>
> head(Counts_set)
>                     A_pKO_aV_FCS G_pKO_aV_FCS M_pKO_aV_FCS D_pKO_aV J_pKO_aV
> ENSMUSG00000000001         4744         4632         4535     4748     3736
> ENSMUSG00000000003            0            0            0        0        0
> ENSMUSG00000000028         1246         1420         1429     2304     1261
> ENSMUSG00000000031            3           25           65        0       50
> ENSMUSG00000000037            0            0            0        0        0
> ENSMUSG00000000049            0            0            3        1        3
>
> cds <- DESeqDataSetFromMatrix (
>      countData =     Counts_set,
>      colData   =     colData,
>      design    = ~    condition
>      )
>
> fit = DESeq(cds)
> des2Report <- HTMLReport(shortName =paste('RNAseq_analysis_', group1, "_",
> group2, sep=""),title ='RNA-seq analysis of differential expression using
> DESeq2',reportDirectory = "./reports")
> publish(fit,des2Report, pvalueCutoff=0.05,annotation.db="org.Mm.eg.db",
> factor = colData(fit)$condition,reportDir="./reports")
> Error: Ids do not appear to be Entrez Ids for the specified species.
> finish(des2Report)
>
>
>> sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>   [1] org.Mm.eg.db_2.14.0     ReportingTools_2.4.0    AnnotationDbi_1.26.0
>   [4] Biobase_2.24.0          RSQLite_0.11.4          DBI_0.2-7
>   [7] knitr_1.5               DESeq2_1.4.0            RcppArmadillo_0.4.200.0
> [10] Rcpp_0.11.1             GenomicRanges_1.16.2    GenomeInfoDb_1.0.2
> [13] IRanges_1.22.3          BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
>   [1] annotate_1.42.0          AnnotationForge_1.6.0
> BatchJobs_1.2
>   [4] BBmisc_1.5               BiocParallel_0.6.0
> biomaRt_2.20.0
>   [7] Biostrings_2.32.0        biovizBase_1.12.0
> bitops_1.0-6
> [10] brew_1.0-6               BSgenome_1.32.0
> Category_2.30.0
> [13] cluster_1.14.4           codetools_0.2-8
> colorspace_1.2-4
> [16] dichromat_2.0-0          digest_0.6.4
> edgeR_3.6.0
> [19] evaluate_0.5.3           fail_1.2
> foreach_1.4.2
> [22] formatR_0.10             Formula_1.1-1
> genefilter_1.46.0
> [25] geneplotter_1.42.0       GenomicAlignments_1.0.0
> GenomicFeatures_1.16.0
> [28] ggbio_1.12.0             ggplot2_0.9.3.1
> GO.db_2.14.0
> [31] GOstats_2.30.0           graph_1.42.0
> grid_3.1.0
> [34] gridExtra_0.9.1          GSEABase_1.26.0
> gtable_0.1.2
> [37] Hmisc_3.14-4             hwriter_1.3
> iterators_1.0.7
> [40] lattice_0.20-24          latticeExtra_0.6-26
> limma_3.20.1
> [43] locfit_1.5-9.1           MASS_7.3-29
> Matrix_1.1-2
> [46] munsell_0.4.2            PFAM.db_2.14.0
> plyr_1.8.1
> [49] proto_0.3-10             RBGL_1.40.0
> RColorBrewer_1.0-5
> [52] RCurl_1.95-4.1           reshape2_1.2.2
> R.methodsS3_1.6.1
> [55] R.oo_1.18.0              Rsamtools_1.16.0
> rtracklayer_1.24.0
> [58] R.utils_1.29.8           scales_0.2.4
> sendmailR_1.1-2
> [61] splines_3.1.0            stats4_3.1.0
> stringr_0.6.2
> [64] survival_2.37-7          tools_3.1.0
> VariantAnnotation_1.10.0
> [67] XML_3.98-1.1             xtable_1.7-3
> XVector_0.4.0
> [70] zlibbioc_1.10.0
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list