[BioC] ReportingTools - trouble incorporating annotations

James W. MacDonald jmacdon at uw.edu
Wed Jun 19 17:55:23 CEST 2013


Hi Sam,

First, please always give us the results of sessionInfo(). This is 
especially critical in the case of ReportingTools, which has been 
fundamentally altered between the previous and current versions of BioC.

On 6/19/2013 11:12 AM, Sam McInturf wrote:
> Bioconductors,
>      I am working on a RNA seq analysis project and am having trouble
> publishing an HTML report for it.  I am unsure of how to make my DE genes
> have the same ID as what publish() will accept when passing an argument to
> 'annotation'.
>      I mapped the reads using tophat and passed the TAIR 10 gtf file to the
> -G option.  When i counted my reads I used the summarizeOverlaps function
> from GenomicRanges and again used this same file.  I called differential
> expression in edgeR using the GLM methods.  So the rownames  of my DE table
> are the AGI identifiers (AT#G#####).  I loaded the org.At.tair.db
> annotations and passed it to HTMLReport in:
>
> publish(DGELists[["Roots"]], myHTML, countTable = cpmMat, conditions =
> group, annotation = "org.At.tair.db", pvaueCutoff = 0.01, lfc =2, n = 1000,
> name = "RootsLRT")
> Error: More than half of your IDs could not be mapped.
> In addition: Warning message:
> In .DGELRT.to.data.frame(object, ...) : NAs introduced by coercion
>
> which makes sense, because publish() is looking for Entrez IDs (right?)
>
> How do I proceed?

Here I assume you are running R-3.0.x and the current release of BioC.

When you run publish() on anything but a data.frame, the first step is 
to coerce to a data.frame using a set of assumptions that might not hold 
in your case (or there may be defaults that you don't like). Because of 
this, I tend to just coerce to a data.frame myself and then publish() 
that directly. This also allows you to pass in arguments to .modifyDF 
which is kind of sweet.

In the case of a DGELRT or DEGExact object, there is a 'genes' slot that 
will be used to annotate the output of topTags(). Ideally you would just 
add the annotation you want to that slot first. So you could do 
something like

annot <- select(org.At.tair.db, DGELists[["Roots"]]$genes[,<Tair column 
goes here>], c("SYMBOL","GENENAME","OTHERSTUFF"))

and then put that in your DGEobjects. Now you can do something like

outlst <- lapply(DGELists, topTags, otherargsgohere)

htmlst <- lapply(seq_len(length(DGELists)) function(x) 
HTMLReport(namevector[x], titlevector[x], otherargs))

and you can define a function similar to this function I use for Entrez 
Gene IDs:

entrezLinks <- function (df, ...){
     df$ENTREZID <- hwriter::hwrite(as.character(df$ENTREZID),
         link = paste0("http://www.ncbi.nlm.nih.gov/gene/", 
as.character(df$ENTREZID)),
         table = FALSE)
     return(df)
}

but modified for the Tair equivalent and then

lapply(seq_len(length(htmlst)), function(x) publish(outlst[[x]], 
htmlst[[x]], .modifyDF = samsTairLinkFun)))
lapply(htmlst, finish)

et voila!

You can also then use htmlst to make a bunch of links in an index.html page.

indx <- HTMLReport("index", "A bunch of links for this expt", 
reportDirectory=".", baseUrl = "")
publish(hwriter::hwrite("Here are links", page(indx), header=2, 
br=TRUE), indx)
publish(Link(htmlst, report=indx), indx)
finish(indx)

Best,

Jim


>
> Thanks in advance!

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list