[BioC] ReportingTools gene IDs
Michael Love
michaelisaiahlove at gmail.com
Tue Apr 29 15:55:10 CEST 2014
hi Assa,
If you look up the help for ?"publish-methods", there is support for
DESeqResults (the 4th data type listed). DESeqResults is the
DataFrame produced by DESeq2::results(). The point of creating this
class was to help simplify the hand-off to ReportingTools. Maybe this
will help?
Mike
On Tue, Apr 29, 2014 at 9:27 AM, Assa Yeroslaviz <frymor at gmail.com> wrote:
> Hi Jim,
>
> thanks for the tip.
> Unfortunately i am not sure i understand the idea behind it.
>
> You say, it is possible to work straight with the DESeqDataSet Object, but
> than the function expects a data.frame to work with. If I understand the
> mechanism with which the publish function is working - it takes the
> DESeqDataSet obejct and, using the results function, coerce it into a
> data.frame.
>
> This is the function I ended up using:
>
> fun <- function(df, object, ...){
> df$ENSEMBL <- rownames(df)
> annot <- select(org.Mm.eg.db, df$ENSEMBL, c("SYMBOL","GENENAME"),
> "ENSEMBL")
> if(nrow(annot) > nrow(df)) annot <- annot[!duplicated(annot[,1]),]
> df <- data.frame(annot, df)
> df <- df[ , -which(names(df) %in% c("ENSEMBL.1"))]
> df$ENSEMBL <- hwrite(as.character(df$ENSEMBL),
> link = paste0("
> http://www.ensembl.org/Mus_musculus/Gene/Summary?g=",
> as.character(df$ENSEMBL)), table = FALSE)
> df
> }
>
>
> As you can see, I changes the column df$ENSEMBL into the rownames of the
> coerced df. this is because the fit object doen't have a column name
> ENSEMBL.
>
> Q. Is there a way to add coluns to the object?
>
> Am I doing it in the most efficient way?
>
> thanks for the help and the tip about the Ensembl links (mouse genome -
> Mm).
>
>
> Assa
>
>
>
> On Fri, Apr 25, 2014 at 3:43 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>
>> Hi Assa,
>>
>> Gabriel actually already gave you the answer, and it is yes. You just have
>> to add things to the .modifyDF argument. There are several examples in
>>
>> http://www.bioconductor.org/packages/release/bioc/
>> vignettes/ReportingTools/inst/doc/basicReportingTools.pdf
>>
>> and here is one (untested) that should apply to your situation:
>>
>> fun <- function(df, object, ...){
>> if(!ENSEMBL %in% names(df))
>> stop("The column name for ensembl ids has to be 'ENSEMBL'!")
>> ensids <- df$ENSEMBL
>> whichcol <- which(names(df) == "ENSEMBL")
>> annot <- select(org.Mm.eg.db, ensids, c("SYMBOL","GENENAME"),
>> "ENSEMBL")
>> if(nrow(annot) > nrow(df)) annot <- annot[!duplicated(annot[,1]),]
>> df <- data.frame(annot, df[,-whichcol])
>> df$ENSEMBL <- hwrite(as.character(df$ENSEMBL),
>> link = paste0(" http://www.ensembl.org/Homo_
>> sapiens/Gene/Summary?g=",
>> as.character(df$ENSEMBL)), table = FALSE)
>> df
>> }
>>
>>
>> This function implicitly assumes (and checks) that there is an ENSEMBL
>> column in your data.frame that it can use to extract the Ensembl IDs. It
>> also assumes that your species is human, and that you have the org.Mm.eg.db
>> package already loaded. It then gets the symbol and genename for those IDs,
>> and does a really naive subsetting of the data if there are duplicates.
>> Other more sophisticated things are possible, but I leave it to you to make
>> any such modifications.
>>
>> You would use this (as Gabriel already said), as part of an argument
>> passed in via .modifyDF. You also need modifyReportDF as well. So your
>> publish argument would now look like
>>
>> publish(fit,des2Report, pvalueCutoff=0.05,annotation.db="org.Mm.eg.db",
>> factor = colData(fit)$condition,reportDir="./reports", .modifyDF =
>> list(modifyReportDF, fun))
>>
>> That at least is the basic idea, and you might need to play around to make
>> things work correctly.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 4/25/2014 4:21 AM, Assa Yeroslaviz wrote:
>>
>>> Hi Gabriel,
>>>
>>> Thanks for the quick answer I will look into that as soon as I have the
>>> time.
>>> Another question was if it is possible to work directy with the Ensembl
>>> IDs.
>>>
>>> I have a table of ~37K ensembl Ids, for which almost 50% have no Entrez
>>> Ids, so I can't convert them. Is there a way to work directly with the
>>> Ensembl IDs and still benefit from the annotation.de <
>>> http://annotation.de> possibilities?
>>>
>>> Thanks
>>>
>>> Assa
>>>
>>>
>>>
>>> On Thu, Apr 24, 2014 at 4:48 PM, Gabriel Becker <gmbecker at ucdavis.edu<mailto:
>>> gmbecker at ucdavis.edu>> wrote:
>>>
>>> I wrote my previous message too quickly. Apologies.
>>>
>>> Your functions must have the signature
>>>
>>> function(df, object, ...)
>>>
>>> df is current data.frame represenation of the object,
>>> object is the *original* object (so that the class can be identified),
>>> ... are passed in from the call to publish
>>>
>>> And you can just place the generic modifyReportDF function at the
>>> beginning of the list, rather than using getMethod. The getMethod
>>> thing I said is for when you want to apply the default handling
>>> for a *different* class to your object. It is a rare use-case, but
>>> came up recently so it was on my mind.
>>>
>>> That will teach me to respond quickly to emails early in the morning.
>>>
>>> Sorry about that.
>>>
>>> ~G
>>>
>>>
>>> On Thu, Apr 24, 2014 at 7:18 AM, Gabriel Becker
>>> <gmbecker at ucdavis.edu <mailto:gmbecker at ucdavis.edu>> wrote:
>>>
>>> Assa,
>>>
>>> In general yes, if you want to add to the table you will be
>>> working with the data.frame.
>>>
>>> You can do so after the initial conversion, though, so you
>>> don't have to recreate the wheel to get from your object to an
>>> initial data.frame.
>>>
>>> To modify the default table (data.frame) generated for an
>>> object, you can pass publish()'s .modifyDF parameter a
>>> function of list of functions, each of which should accept
>>> object (the data.frame) and "..." and return a data.frame.
>>>
>>> These will be called in order, each accepting the output from
>>> the last. The output of the final function is what will be
>>> transformed into HTML and inserted into the report.
>>>
>>> You'll probably want to add the default handling of your
>>> object type, which you can do by putting
>>> getMethod("modifyReportDF", "<your object's class>") at the
>>> beginning of the list.
>>>
>>> See section 4 of the ReportingTools basics vignette for
>>> example code.
>>>
>>> HTH,
>>> ~G
>>>
>>>
>>> On Thu, Apr 24, 2014 at 6:54 AM, Assa Yeroslaviz
>>> <frymor at gmail.com <mailto:frymor at gmail.com>> wrote:
>>>
>>> Thanks Jim,
>>>
>>> I have found in one of the forums a response from Jason
>>> (thanks again) for
>>> the option to set annotation.db=NULL and though force the
>>> publish command
>>> to work with the Ids I provide in the DESeqDataSet object.
>>>
>>> So this is now working, But I would like to have also the
>>> option to add
>>> some annotations to the table.
>>>
>>> Is this only possible when working directly with a data
>>> .frame?
>>>
>>> Thanks again
>>> Assa
>>>
>>> On Thu, Apr 24, 2014 at 3:45 PM, James W. MacDonald
>>> <jmacdon at uw.edu <mailto:jmacdon at uw.edu>> wrote:
>>>
>>> > Hi Assa,
>>> >
>>> > There may well be a way to work with Ensembl IDs, and
>>> you will likely get
>>> > an answer to your direct question from one of the
>>> maintainers.
>>> >
>>> > However you should note that ReportingTools simply takes
>>> the input object
>>> > and then coerces the data to a data.frame, which is then
>>> used to create the
>>> > HTML table. You can always create the data.frame to your
>>> own liking up
>>> > front, and then pass that to publish(). While this is
>>> more work than just
>>> > passing in the DESeqDataSet, you do have complete
>>> control over the output.
>>> >
>>> > Best,
>>> >
>>> > Jim
>>> >
>>> >
>>> >
>>> > On 4/24/2014 8:50 AM, Assa Yeroslaviz wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> Is it neccessary to have entrez gene IDs to work with
>>> this package?
>>> >>
>>> >> I am working on a dataset with Ensembl IDs. Do I need
>>> to convert them to
>>> >> Entrez?
>>> >>
>>> >> When trying to create a report for a DESeqDataSet or
>>> DESeqResults objects
>>> >> i
>>> >> am getting the error messege:
>>> >>
>>> >> Error: Ids do not appear to be Entrez Ids for the
>>> specified species.
>>> >>
>>> >> Is there a way to work straight with the ensembl IDs?
>>> >>
>>> >> Thanks
>>> >>
>>> >> Assa
>>> >>
>>> >> my script:
>>> >>
>>> >> head(Counts_set)
>>> >> A_pKO_aV_FCS G_pKO_aV_FCS M_pKO_aV_FCS D_pKO_aV
>>> >> J_pKO_aV
>>> >> ENSMUSG00000000001 4744 4632 4535 4748
>>> >> 3736
>>> >> ENSMUSG00000000003 0 0 0 0
>>> >> 0
>>> >> ENSMUSG00000000028 1246 1420 1429 2304
>>> >> 1261
>>> >> ENSMUSG00000000031 3 25 65 0
>>> >> 50
>>> >> ENSMUSG00000000037 0 0 0 0
>>> >> 0
>>> >> ENSMUSG00000000049 0 0 3 1
>>> >> 3
>>> >>
>>> >> cds <- DESeqDataSetFromMatrix (
>>> >> countData = Counts_set,
>>> >> colData = colData,
>>> >> design = ~ condition
>>> >> )
>>> >>
>>> >> fit = DESeq(cds)
>>> >> des2Report <- HTMLReport(shortName
>>> =paste('RNAseq_analysis_', group1, "_",
>>> >> group2, sep=""),title ='RNA-seq analysis of
>>> differential expression using
>>> >> DESeq2',reportDirectory = "./reports")
>>> >> publish(fit,des2Report,
>>> pvalueCutoff=0.05,annotation.db="org.Mm.eg.db",
>>> >> factor = colData(fit)$condition,reportDir="./reports")
>>> >> Error: Ids do not appear to be Entrez Ids for the
>>> specified species.
>>> >> finish(des2Report)
>>> >>
>>> >>
>>> >> sessionInfo()
>>> >>>
>>> >> R version 3.1.0 (2014-04-10)
>>> >> Platform: x86_64-pc-linux-gnu (64-bit)
>>> >>
>>> >> locale:
>>> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> >> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>> >>
>>> >> attached base packages:
>>> >> [1] parallel stats graphics grDevices utils datasets
>>> methods
>>> >> [8] base
>>> >>
>>> >> other attached packages:
>>> >> [1] org.Mm.eg.db_2.14.0 ReportingTools_2.4.0
>>> AnnotationDbi_1.26.0
>>> >> [4] Biobase_2.24.0 RSQLite_0.11.4 DBI_0.2-7
>>> >> [7] knitr_1.5 DESeq2_1.4.0
>>> >> RcppArmadillo_0.4.200.0
>>> >> [10] Rcpp_0.11.1 GenomicRanges_1.16.2 GenomeInfoDb_1.0.2
>>> >> [13] IRanges_1.22.3 BiocGenerics_0.10.0
>>> >>
>>> >> loaded via a namespace (and not attached):
>>> >> [1] annotate_1.42.0 AnnotationForge_1.6.0
>>> >> BatchJobs_1.2
>>> >> [4] BBmisc_1.5 BiocParallel_0.6.0
>>> >> biomaRt_2.20.0
>>> >> [7] Biostrings_2.32.0 biovizBase_1.12.0
>>> >> bitops_1.0-6
>>> >> [10] brew_1.0-6 BSgenome_1.32.0
>>> >> Category_2.30.0
>>> >> [13] cluster_1.14.4 codetools_0.2-8
>>> >> colorspace_1.2-4
>>> >> [16] dichromat_2.0-0 digest_0.6.4
>>> >> edgeR_3.6.0
>>> >> [19] evaluate_0.5.3 fail_1.2
>>> >> foreach_1.4.2
>>> >> [22] formatR_0.10 Formula_1.1-1
>>> >> genefilter_1.46.0
>>> >> [25] geneplotter_1.42.0 GenomicAlignments_1.0.0
>>> >> GenomicFeatures_1.16.0
>>> >> [28] ggbio_1.12.0 ggplot2_0.9.3.1
>>> >> GO.db_2.14.0
>>> >> [31] GOstats_2.30.0 graph_1.42.0
>>> >> grid_3.1.0
>>> >> [34] gridExtra_0.9.1 GSEABase_1.26.0
>>> >> gtable_0.1.2
>>> >> [37] Hmisc_3.14-4 hwriter_1.3
>>> >> iterators_1.0.7
>>> >> [40] lattice_0.20-24 latticeExtra_0.6-26
>>> >> limma_3.20.1
>>> >> [43] locfit_1.5-9.1 MASS_7.3-29
>>> >> Matrix_1.1-2
>>> >> [46] munsell_0.4.2 PFAM.db_2.14.0
>>> >> plyr_1.8.1
>>> >> [49] proto_0.3-10 RBGL_1.40.0
>>> >> RColorBrewer_1.0-5
>>> >> [52] RCurl_1.95-4.1 reshape2_1.2.2
>>> >> R.methodsS3_1.6.1
>>> >> [55] R.oo_1.18.0 Rsamtools_1.16.0
>>> >> rtracklayer_1.24.0
>>> >> [58] R.utils_1.29.8 scales_0.2.4
>>> >> sendmailR_1.1-2
>>> >> [61] splines_3.1.0 stats4_3.1.0
>>> >> stringr_0.6.2
>>> >> [64] survival_2.37-7 tools_3.1.0
>>> >> VariantAnnotation_1.10.0
>>> >> [67] XML_3.98-1.1 xtable_1.7-3
>>> >> XVector_0.4.0
>>> >> [70] zlibbioc_1.10.0
>>> >>
>>> >> [[alternative HTML version deleted]]
>>> >>
>>> >> _______________________________________________
>>> >> Bioconductor mailing list
>>> >> Bioconductor at r-project.org
>>> <mailto:Bioconductor at r-project.org>
>>>
>>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> >> Search the archives: http://news.gmane.org/gmane.
>>> >> science.biology.informatics.conductor
>>> >>
>>> >
>>> > --
>>> > James W. MacDonald, M.S.
>>> > Biostatistician
>>> > University of Washington
>>> > Environmental and Occupational Health Sciences
>>> > 4225 Roosevelt Way NE, # 100
>>> > Seattle WA 98105-6099
>>> >
>>> >
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org
>>> >
>>>
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.
>>> conductor
>>>
>>>
>>>
>>>
>>> -- Gabriel Becker
>>> Graduate Student
>>> Statistics Department
>>> University of California, Davis
>>>
>>>
>>>
>>>
>>> -- Gabriel Becker
>>> Graduate Student
>>> Statistics Department
>>> University of California, Davis
>>>
>>>
>>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list