[BioC] ReportingTools - trouble incorporating annotations

James W. MacDonald jmacdon at uw.edu
Wed Jun 19 19:55:40 CEST 2013


Hi Sam,

Sorry, my bad. One of the arguments for DGEList is 'genes', which is

    genes: data frame containing annotation information for the
           tags/transcripts/genes.

So yeah, when you are creating your DGELists, add in the annotation data 
for each gene/transcript/whatever, and then go from there.

Best,

Jim


On 6/19/2013 1:37 PM, Sam McInturf wrote:
> Jim,
>    When I look at my DGELRT objects I don't have a $genes field: (to 
> be verbose, DGELists is a list of DGELRT objects (list(glmLRT(), 
> glmLRT(),...))
>
> > DGELists$Roots$genes
> NULL
>
> Looking at what fields I do have: coefficients, fitted.values, 
> deviance, df.residual, design, offset, dispersion, method, samples, 
> AveLogCPM, table, comparison, df.test
>
> Looking at the DGEGLM which my DGELists were made, I have no gene slot 
> either.  I added the rownames of the counts table in the DGEGLM to a 
> new $genes:
>
> >fit$genes <- rownames(fit$counts)
>
> but the the $genes slot did not get passed onto the DGELRTs.  Is it 
> best just to add the $genes field to each DGELRT and then follow what 
> you stated before?
>
> Thanks,
>
> > sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>  [1] ReportingTools_2.0.1 org.At.tair.db_2.9.0 RSQLite_0.11.4
>  [4] DBI_0.2-7            AnnotationDbi_1.22.6 Biobase_2.20.0
>  [7] GenomicRanges_1.12.4 IRanges_1.18.1       BiocGenerics_0.6.0
> [10] edgeR_3.2.3          limma_3.16.5         BiocInstaller_1.10.2
>
> loaded via a namespace (and not attached):
>  [1] annotate_1.38.0         AnnotationForge_1.2.1   biomaRt_2.16.0
>  [4] Biostrings_2.28.0       biovizBase_1.8.1        bitops_1.0-5
>  [7] BSgenome_1.28.0         Category_2.26.0         cluster_1.14.4
> [10] colorspace_1.2-2        dichromat_2.0-0         digest_0.6.3
> [13] genefilter_1.42.0       GenomicFeatures_1.12.2  ggbio_1.8.3
> [16] ggplot2_0.9.3.1         GO.db_2.9.0             GOstats_2.26.0
> [19] graph_1.38.2            grid_3.0.1              gridExtra_0.9.1
> [22] GSEABase_1.22.0         gtable_0.1.2            Hmisc_3.10-1.1
> [25] hwriter_1.3             labeling_0.1            lattice_0.20-15
> [28] MASS_7.3-26             munsell_0.4             PFAM.db_2.9.0
> [31] plyr_1.8                proto_0.3-10            RBGL_1.36.2
> [34] RColorBrewer_1.0-5      RCurl_1.95-4.1          reshape2_1.2.2
> [37] R.methodsS3_1.4.2       R.oo_1.13.0             Rsamtools_1.12.3
> [40] rtracklayer_1.20.2      R.utils_1.23.2          scales_0.2.3
> [43] splines_3.0.1           stats4_3.0.1            stringr_0.6.2
> [46] survival_2.37-4         tools_3.0.1             
> VariantAnnotation_1.6.6
> [49] XML_3.96-1.1            xtable_1.7-1            zlibbioc_1.6.0
>
>
>
>
> On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon at uw.edu 
> <mailto:jmacdon at uw.edu>> wrote:
>
>     Hi Sam,
>
>     First, please always give us the results of sessionInfo(). This is
>     especially critical in the case of ReportingTools, which has been
>     fundamentally altered between the previous and current versions of
>     BioC.
>
>
>     On 6/19/2013 11:12 AM, Sam McInturf wrote:
>
>         Bioconductors,
>              I am working on a RNA seq analysis project and am having
>         trouble
>         publishing an HTML report for it.  I am unsure of how to make
>         my DE genes
>         have the same ID as what publish() will accept when passing an
>         argument to
>         'annotation'.
>              I mapped the reads using tophat and passed the TAIR 10
>         gtf file to the
>         -G option.  When i counted my reads I used the
>         summarizeOverlaps function
>         from GenomicRanges and again used this same file.  I called
>         differential
>         expression in edgeR using the GLM methods.  So the rownames
>          of my DE table
>         are the AGI identifiers (AT#G#####).  I loaded the org.At.tair.db
>         annotations and passed it to HTMLReport in:
>
>         publish(DGELists[["Roots"]], myHTML, countTable = cpmMat,
>         conditions =
>         group, annotation = "org.At.tair.db", pvaueCutoff = 0.01, lfc
>         =2, n = 1000,
>         name = "RootsLRT")
>         Error: More than half of your IDs could not be mapped.
>         In addition: Warning message:
>         In .DGELRT.to.data.frame(object, ...) : NAs introduced by coercion
>
>         which makes sense, because publish() is looking for Entrez IDs
>         (right?)
>
>         How do I proceed?
>
>
>     Here I assume you are running R-3.0.x and the current release of BioC.
>
>     When you run publish() on anything but a data.frame, the first
>     step is to coerce to a data.frame using a set of assumptions that
>     might not hold in your case (or there may be defaults that you
>     don't like). Because of this, I tend to just coerce to a
>     data.frame myself and then publish() that directly. This also
>     allows you to pass in arguments to .modifyDF which is kind of sweet.
>
>     In the case of a DGELRT or DEGExact object, there is a 'genes'
>     slot that will be used to annotate the output of topTags().
>     Ideally you would just add the annotation you want to that slot
>     first. So you could do something like
>
>     annot <- select(org.At.tair.db, DGELists[["Roots"]]$genes[,<Tair
>     column goes here>], c("SYMBOL","GENENAME","OTHERSTUFF"))
>
>     and then put that in your DGEobjects. Now you can do something like
>
>     outlst <- lapply(DGELists, topTags, otherargsgohere)
>
>     htmlst <- lapply(seq_len(length(DGELists)) function(x)
>     HTMLReport(namevector[x], titlevector[x], otherargs))
>
>     and you can define a function similar to this function I use for
>     Entrez Gene IDs:
>
>     entrezLinks <- function (df, ...){
>         df$ENTREZID <- hwriter::hwrite(as.character(df$ENTREZID),
>             link = paste0("http://www.ncbi.nlm.nih.gov/gene/",
>     as.character(df$ENTREZID)),
>             table = FALSE)
>         return(df)
>     }
>
>     but modified for the Tair equivalent and then
>
>     lapply(seq_len(length(htmlst)), function(x) publish(outlst[[x]],
>     htmlst[[x]], .modifyDF = samsTairLinkFun)))
>     lapply(htmlst, finish)
>
>     et voila!
>
>     You can also then use htmlst to make a bunch of links in an
>     index.html page.
>
>     indx <- HTMLReport("index", "A bunch of links for this expt",
>     reportDirectory=".", baseUrl = "")
>     publish(hwriter::hwrite("Here are links", page(indx), header=2,
>     br=TRUE), indx)
>     publish(Link(htmlst, report=indx), indx)
>     finish(indx)
>
>     Best,
>
>     Jim
>
>
>
>         Thanks in advance!
>
>
>     -- 
>     James W. MacDonald, M.S.
>     Biostatistician
>     University of Washington
>     Environmental and Occupational Health Sciences
>     4225 Roosevelt Way NE, # 100
>     Seattle WA 98105-6099
>
>
>
>
> -- 
> Sam McInturf

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list