[BioC] Getting Introns Expression at a Per Gene Level

Valerie Obenchain vobencha at fhcrc.org
Mon Feb 4 21:46:50 CET 2013


The error is saying the 'reads' argument cannot be a GRanges object. The 
reason for this is because GRanges does not keep track of read gaps. 
When counting with summarizeOverlaps we want to consider the read gaps 
so we use the GappedAlignments and GappedAlignmentPairs objects which 
are 'gap-aware' and store the CIGARs. The summarizeOverlaps man page 
indicates what are valid objects for the 'reads' argument,

?summarizeOverlaps

The reads can also be in a Bam file. Read more about that method at

library(Rsamtools)
library(GenomicRanges)
?'summarizeOverlaps,GRanges,BamFileList-method'


The countOverlaps (and findOverlaps) functions will double count reads, 
summarizeOverlaps does not. See the example on the

?summarizeOverlaps

man page for a comparison between findOverlaps and summarizeOverlaps. 
The man page also describes the different 'modes' in summarizeOverlaps 
that resolve reads that do hit multiple annotation features.

Valerie


On 02/04/2013 10:35 AM, Fong Chun Chan wrote:
> Hi Valarie,
>
> Thanks for the reply. I was testing this and it doesn't seem to work
> with the summarizeOverlaps() function.  When I try it, I get this error:
>
> countsForIntrons <- summarizeOverlaps(intronsByGene, reads);
> Error in function (classes, fdef, mtable)  :
>    unable to find an inherited method for function ‘summarizeOverlaps’
> for signature ‘"GRangesList", "GRanges"’
>
> Is it supposed to be countOverlaps() instead of summarizeOverlap()?
>
> Thanks,
>
> Fong
>
>
> On Wed, Jan 30, 2013 at 12:23 PM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
>     Hi Fong,
>
>     We don't have a single function call to get introns by gene but you
>     can put one together in a few steps.
>
>     library(TxDb.Hsapiens.UCSC.__hg19.knownGene)
>     txdb <- TxDb.Hsapiens.UCSC.hg19.__knownGene
>
>     Specifying use.names = TRUE puts the transcript names on the
>     GRangesList.
>
>          introns <- intronsByTranscript(txdb, use.names=TRUE)
>
>     Unlist and remove duplicate introns.
>
>          ulst <- unlist(introns)
>          intronsNoDups <- ulst[!duplicated(ulst)]
>
>     The select() function can map txid to geneid (and others). See ?select.
>
>          txnames <- names(intronsNoDups)
>          map <- select(txdb, keys=txnames, keytype='TXNAME', cols='GENEID')
>
>     Not all transcripts are associated with a gene id and some are
>     associated with more than one.
>
>      > head(map, 10)
>             TXNAME    GENEID
>     1  uc001aaa.3 <NA>
>     2  uc001aaa.3 <NA>
>     3  uc010nxq.1 <NA>
>     4  uc010nxq.1 <NA>
>     5  uc010nxr.1 <NA>
>     6  uc010nxr.1 <NA>
>     7  uc009vjk.2 100133331
>     8  uc009vjk.2 100133331
>     9  uc001aau.3 100132287
>     10 uc021oeh.1 100133331
>
>
>     To get a GRangesList of introns by gene, split the introns on the
>     associated gene id.
>
>          idx <- map$GENEID[!is.na <http://is.na>(map$GENEID)]
>          intronsByGene <- split(intronsNoDups[!is.na
>     <http://is.na>(__map$GENEID)], idx)
>          names(intronsByGene) <- unique(idx)
>
>
>     The list names are geneid and the element rownames are txid.
>
>      > intronsByGene
>     GRangesList of length 19784:
>     $100133331
>     GRanges with 13 ranges and 0 metadata columns:
>                   seqnames               ranges strand
>     <Rle> <IRanges> <Rle>
>        uc002qsd.4    chr19 [58858396, 58858718]      -
>        uc002qsd.4    chr19 [58859007, 58861735]      -
>        uc002qsd.4    chr19 [58862018, 58862756]      -
>     ...
>
>     $100132287
>     GRanges with 1 range and 0 metadata columns:
>                   seqnames               ranges strand
>        uc003wyw.1     chr8 [18248856, 18257507]      +
>
>
>     To count reads against this annotation you can use
>     summarizeOverlaps(). The 'reads' can be Bam files, GappedAlignments
>     or GappedAlignmentPairs. The 'features' argument will be the
>     GRangesList of introns by gene. The result will be counts per gene.
>
>     To count just introns with no grouping you can still use
>     summarizeOverlaps() but use the 'intronsNoDups' GRanges object as
>     'features'.
>
>
>     Valerie
>
>
>
>     On 01/29/2013 11:38 AM, Fong Chun Chan wrote:
>
>         Hi,
>
>         I am using the R package, Genomic Features, to get Intron
>         Expression and
>         the function that I am using is the intronsByTranscript(). While
>         this
>         function is useful, it returns the number of raw reads that
>         align to each
>         intron grouped at the transcript level. Is there an easy way to
>         get it to
>         group it by a gene such that I am grabbing all the introns that
>         fall in all
>         the transcripts belong to a gene (and without double counting
>         the intronic
>         space).
>
>         Similar, is there a way to easily get the raw read count at an
>         individual
>         intron level rather than having it grouped? The introns()
>         function seems to
>         be defunct now as it recommends that we use the
>         intronsByTranscript()
>         function.
>
>         Thanks,
>
>         Fong
>
>                  [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>



More information about the Bioconductor mailing list