[BioC] ReportingTools - trouble incorporating annotations

Gabriel Becker gmbecker at ucdavis.edu
Wed Jun 19 21:22:56 CEST 2013


James,

Thanks for responding to Sam's question.

One small point, you can control the coercion of the object to a data.frame
within the publish call via the .toDF argument, which is applied directly
before .modifyDF.

The defaults of these steps (coercion and modification of data.frames) can
also be extended on an S4 class basis, by creating new methods for
toReportDF and modifyReportDF if you find yourself always using the same
customization across multiple scripts (this approaches creating a package
which depends on ReportingTools, however, and is likely overkill in most
situations).

There is a flowchart for the full publish mechanism in
inst/examples/publishFlowchart.svg, which I am attaching here as well for
convenience.

~G


On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon at uw.edu> wrote:

> Hi Sam,
>
> Sorry, my bad. One of the arguments for DGEList is 'genes', which is
>
>    genes: data frame containing annotation information for the
>           tags/transcripts/genes.
>
> So yeah, when you are creating your DGELists, add in the annotation data
> for each gene/transcript/whatever, and then go from there.
>
> Best,
>
> Jim
>
>
>
> On 6/19/2013 1:37 PM, Sam McInturf wrote:
>
>> Jim,
>>    When I look at my DGELRT objects I don't have a $genes field: (to be
>> verbose, DGELists is a list of DGELRT objects (list(glmLRT(), glmLRT(),...))
>>
>> > DGELists$Roots$genes
>> NULL
>>
>> Looking at what fields I do have: coefficients, fitted.values, deviance,
>> df.residual, design, offset, dispersion, method, samples, AveLogCPM, table,
>> comparison, df.test
>>
>> Looking at the DGEGLM which my DGELists were made, I have no gene slot
>> either.  I added the rownames of the counts table in the DGEGLM to a new
>> $genes:
>>
>> >fit$genes <- rownames(fit$counts)
>>
>> but the the $genes slot did not get passed onto the DGELRTs.  Is it best
>> just to add the $genes field to each DGELRT and then follow what you stated
>> before?
>>
>> Thanks,
>>
>> > sessionInfo()
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>>  [1] ReportingTools_2.0.1 org.At.tair.db_2.9.0 RSQLite_0.11.4
>>  [4] DBI_0.2-7            AnnotationDbi_1.22.6 Biobase_2.20.0
>>  [7] GenomicRanges_1.12.4 IRanges_1.18.1       BiocGenerics_0.6.0
>> [10] edgeR_3.2.3          limma_3.16.5         BiocInstaller_1.10.2
>>
>> loaded via a namespace (and not attached):
>>  [1] annotate_1.38.0         AnnotationForge_1.2.1   biomaRt_2.16.0
>>  [4] Biostrings_2.28.0       biovizBase_1.8.1        bitops_1.0-5
>>  [7] BSgenome_1.28.0         Category_2.26.0         cluster_1.14.4
>> [10] colorspace_1.2-2        dichromat_2.0-0         digest_0.6.3
>> [13] genefilter_1.42.0       GenomicFeatures_1.12.2  ggbio_1.8.3
>> [16] ggplot2_0.9.3.1         GO.db_2.9.0             GOstats_2.26.0
>> [19] graph_1.38.2            grid_3.0.1              gridExtra_0.9.1
>> [22] GSEABase_1.22.0         gtable_0.1.2            Hmisc_3.10-1.1
>> [25] hwriter_1.3             labeling_0.1            lattice_0.20-15
>> [28] MASS_7.3-26             munsell_0.4             PFAM.db_2.9.0
>> [31] plyr_1.8                proto_0.3-10            RBGL_1.36.2
>> [34] RColorBrewer_1.0-5      RCurl_1.95-4.1          reshape2_1.2.2
>> [37] R.methodsS3_1.4.2       R.oo_1.13.0             Rsamtools_1.12.3
>> [40] rtracklayer_1.20.2      R.utils_1.23.2          scales_0.2.3
>> [43] splines_3.0.1           stats4_3.0.1            stringr_0.6.2
>> [46] survival_2.37-4         tools_3.0.1
>> VariantAnnotation_1.6.6
>> [49] XML_3.96-1.1            xtable_1.7-1            zlibbioc_1.6.0
>>
>>
>>
>>
>> On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon at uw.edu<mailto:
>> jmacdon at uw.edu>> wrote:
>>
>>     Hi Sam,
>>
>>     First, please always give us the results of sessionInfo(). This is
>>     especially critical in the case of ReportingTools, which has been
>>     fundamentally altered between the previous and current versions of
>>     BioC.
>>
>>
>>     On 6/19/2013 11:12 AM, Sam McInturf wrote:
>>
>>         Bioconductors,
>>              I am working on a RNA seq analysis project and am having
>>         trouble
>>         publishing an HTML report for it.  I am unsure of how to make
>>         my DE genes
>>         have the same ID as what publish() will accept when passing an
>>         argument to
>>         'annotation'.
>>              I mapped the reads using tophat and passed the TAIR 10
>>         gtf file to the
>>         -G option.  When i counted my reads I used the
>>         summarizeOverlaps function
>>         from GenomicRanges and again used this same file.  I called
>>         differential
>>         expression in edgeR using the GLM methods.  So the rownames
>>          of my DE table
>>         are the AGI identifiers (AT#G#####).  I loaded the org.At.tair.db
>>         annotations and passed it to HTMLReport in:
>>
>>         publish(DGELists[["Roots"]], myHTML, countTable = cpmMat,
>>         conditions =
>>         group, annotation = "org.At.tair.db", pvaueCutoff = 0.01, lfc
>>         =2, n = 1000,
>>         name = "RootsLRT")
>>         Error: More than half of your IDs could not be mapped.
>>         In addition: Warning message:
>>         In .DGELRT.to.data.frame(object, ...) : NAs introduced by coercion
>>
>>         which makes sense, because publish() is looking for Entrez IDs
>>         (right?)
>>
>>         How do I proceed?
>>
>>
>>     Here I assume you are running R-3.0.x and the current release of BioC.
>>
>>     When you run publish() on anything but a data.frame, the first
>>     step is to coerce to a data.frame using a set of assumptions that
>>     might not hold in your case (or there may be defaults that you
>>     don't like). Because of this, I tend to just coerce to a
>>     data.frame myself and then publish() that directly. This also
>>     allows you to pass in arguments to .modifyDF which is kind of sweet.
>>
>>     In the case of a DGELRT or DEGExact object, there is a 'genes'
>>     slot that will be used to annotate the output of topTags().
>>     Ideally you would just add the annotation you want to that slot
>>     first. So you could do something like
>>
>>     annot <- select(org.At.tair.db, DGELists[["Roots"]]$genes[,<**Tair
>>     column goes here>], c("SYMBOL","GENENAME","**OTHERSTUFF"))
>>
>>
>>     and then put that in your DGEobjects. Now you can do something like
>>
>>     outlst <- lapply(DGELists, topTags, otherargsgohere)
>>
>>     htmlst <- lapply(seq_len(length(**DGELists)) function(x)
>>
>>     HTMLReport(namevector[x], titlevector[x], otherargs))
>>
>>     and you can define a function similar to this function I use for
>>     Entrez Gene IDs:
>>
>>     entrezLinks <- function (df, ...){
>>         df$ENTREZID <- hwriter::hwrite(as.character(**df$ENTREZID),
>>             link = paste0("http://www.ncbi.nlm.**nih.gov/gene/<http://www.ncbi.nlm.nih.gov/gene/>
>> ",
>>
>>     as.character(df$ENTREZID)),
>>             table = FALSE)
>>         return(df)
>>     }
>>
>>     but modified for the Tair equivalent and then
>>
>>     lapply(seq_len(length(htmlst))**, function(x) publish(outlst[[x]],
>>
>>     htmlst[[x]], .modifyDF = samsTairLinkFun)))
>>     lapply(htmlst, finish)
>>
>>     et voila!
>>
>>     You can also then use htmlst to make a bunch of links in an
>>     index.html page.
>>
>>     indx <- HTMLReport("index", "A bunch of links for this expt",
>>     reportDirectory=".", baseUrl = "")
>>     publish(hwriter::hwrite("Here are links", page(indx), header=2,
>>     br=TRUE), indx)
>>     publish(Link(htmlst, report=indx), indx)
>>     finish(indx)
>>
>>     Best,
>>
>>     Jim
>>
>>
>>
>>         Thanks in advance!
>>
>>
>>     --     James W. MacDonald, M.S.
>>     Biostatistician
>>     University of Washington
>>     Environmental and Occupational Health Sciences
>>     4225 Roosevelt Way NE, # 100
>>     Seattle WA 98105-6099
>>
>>
>>
>>
>> --
>> Sam McInturf
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> ______________________________**_________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives: http://news.gmane.org/gmane.**
> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>



-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis


More information about the Bioconductor mailing list