[BioC] Getting Introns Expression at a Per Gene Level

Tue Feb 5 14:57:02 CET 2013

On 02/04/13 13:57, Fong Chun Chan wrote:
> HI Valerie,
>
> Thanks for the information. I've corrected this now. If I have 
> paired-end read libraries and I am strictly interested in just getting 
> the raw reads that summarize over a exon, will there be a significant 
> difference between using readBamGappedAlignments() 
> and readBamGappedAlignmentPairs() to grab my reads?

This depends on the reads and annotation.

>
> Based on my understanding of readBamGappedAlignmentPairs() it will 
> simply treat the paired-end reads as a single entity and so if both 
> reads align then we would be counting them as one unit. However if I 
> use readBamGappedAlignments() then all the reads are treated a 
> single-end reads. And so even if one read does not align, the other 
> read is counted correct? I would suspect the results to be quite 
> similar at the end or am I mistaken and missing something crucial.

You are correct in that counting/overlap routines treat the reads 
(ranges) in GappedAlignmentPairs as one entity. If your data are 
paired-end and you read into GappedAlignments the number of reads are 
effectively doubled (perhaps more with possibility of unmated reads). 
Now when you count, each read fragment is considered an independent 
read. Whether or not the results would be similar with GappedAlignments 
and GappedAlignmentPairs depends on the data. If there are many reads 
that have a portion of both read pairs aligned to the annotation then 
the counts will be quite different.

Valerie
>
> Thanks
>
>
> On Mon, Feb 4, 2013 at 12:46 PM, Valerie Obenchain <vobencha at fhcrc.org 
> <mailto:vobencha at fhcrc.org>> wrote:
>
>     The error is saying the 'reads' argument cannot be a GRanges
>     object. The reason for this is because GRanges does not keep track
>     of read gaps. When counting with summarizeOverlaps we want to
>     consider the read gaps so we use the GappedAlignments and
>     GappedAlignmentPairs objects which are 'gap-aware' and store the
>     CIGARs. The summarizeOverlaps man page indicates what are valid
>     objects for the 'reads' argument,
>
>     ?summarizeOverlaps
>
>     The reads can also be in a Bam file. Read more about that method at
>
>     library(Rsamtools)
>     library(GenomicRanges)
>     ?'summarizeOverlaps,GRanges,BamFileList-method'
>
>
>     The countOverlaps (and findOverlaps) functions will double count
>     reads, summarizeOverlaps does not. See the example on the
>
>     ?summarizeOverlaps
>
>     man page for a comparison between findOverlaps and
>     summarizeOverlaps. The man page also describes the different
>     'modes' in summarizeOverlaps that resolve reads that do hit
>     multiple annotation features.
>
>     Valerie
>
>
>
>     On 02/04/2013 10:35 AM, Fong Chun Chan wrote:
>
>         Hi Valarie,
>
>         Thanks for the reply. I was testing this and it doesn't seem
>         to work
>         with the summarizeOverlaps() function.  When I try it, I get
>         this error:
>
>         countsForIntrons <- summarizeOverlaps(intronsByGene, reads);
>         Error in function (classes, fdef, mtable)  :
>            unable to find an inherited method for function
>         ‘summarizeOverlaps’
>         for signature ‘"GRangesList", "GRanges"’
>
>         Is it supposed to be countOverlaps() instead of
>         summarizeOverlap()?
>
>         Thanks,
>
>         Fong
>
>
>         On Wed, Jan 30, 2013 at 12:23 PM, Valerie Obenchain
>         <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>
>         <mailto:vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>>> wrote:
>
>             Hi Fong,
>
>             We don't have a single function call to get introns by
>         gene but you
>             can put one together in a few steps.
>
>             library(TxDb.Hsapiens.UCSC.__hg19.knownGene)
>             txdb <- TxDb.Hsapiens.UCSC.hg19.__knownGene
>
>
>             Specifying use.names = TRUE puts the transcript names on the
>             GRangesList.
>
>                  introns <- intronsByTranscript(txdb, use.names=TRUE)
>
>             Unlist and remove duplicate introns.
>
>                  ulst <- unlist(introns)
>                  intronsNoDups <- ulst[!duplicated(ulst)]
>
>             The select() function can map txid to geneid (and others).
>         See ?select.
>
>                  txnames <- names(intronsNoDups)
>                  map <- select(txdb, keys=txnames, keytype='TXNAME',
>         cols='GENEID')
>
>             Not all transcripts are associated with a gene id and some are
>             associated with more than one.
>
>         > head(map, 10)
>                     TXNAME    GENEID
>             1  uc001aaa.3 <NA>
>             2  uc001aaa.3 <NA>
>             3  uc010nxq.1 <NA>
>             4  uc010nxq.1 <NA>
>             5  uc010nxr.1 <NA>
>             6  uc010nxr.1 <NA>
>             7  uc009vjk.2 100133331
>             8  uc009vjk.2 100133331
>             9  uc001aau.3 100132287
>             10 uc021oeh.1 100133331
>
>
>             To get a GRangesList of introns by gene, split the introns
>         on the
>             associated gene id.
>
>                  idx <- map$GENEID[!is.na <http://is.na>
>         <http://is.na>(map$GENEID)]
>                  intronsByGene <- split(intronsNoDups[!is.na
>         <http://is.na>
>         <http://is.na>(__map$GENEID)], idx)
>
>                  names(intronsByGene) <- unique(idx)
>
>
>             The list names are geneid and the element rownames are txid.
>
>         > intronsByGene
>             GRangesList of length 19784:
>             $100133331
>             GRanges with 13 ranges and 0 metadata columns:
>                           seqnames               ranges strand
>         <Rle> <IRanges> <Rle>
>                uc002qsd.4    chr19 [58858396, 58858718]      -
>                uc002qsd.4    chr19 [58859007, 58861735]      -
>                uc002qsd.4    chr19 [58862018, 58862756]      -
>             ...
>
>             $100132287
>             GRanges with 1 range and 0 metadata columns:
>                           seqnames               ranges strand
>                uc003wyw.1     chr8 [18248856, 18257507]      +
>
>
>             To count reads against this annotation you can use
>             summarizeOverlaps(). The 'reads' can be Bam files,
>         GappedAlignments
>             or GappedAlignmentPairs. The 'features' argument will be the
>             GRangesList of introns by gene. The result will be counts
>         per gene.
>
>             To count just introns with no grouping you can still use
>             summarizeOverlaps() but use the 'intronsNoDups' GRanges
>         object as
>             'features'.
>
>
>             Valerie
>
>
>
>             On 01/29/2013 11:38 AM, Fong Chun Chan wrote:
>
>                 Hi,
>
>                 I am using the R package, Genomic Features, to get Intron
>                 Expression and
>                 the function that I am using is the
>         intronsByTranscript(). While
>                 this
>                 function is useful, it returns the number of raw reads
>         that
>                 align to each
>                 intron grouped at the transcript level. Is there an
>         easy way to
>                 get it to
>                 group it by a gene such that I am grabbing all the
>         introns that
>                 fall in all
>                 the transcripts belong to a gene (and without double
>         counting
>                 the intronic
>                 space).
>
>                 Similar, is there a way to easily get the raw read
>         count at an
>                 individual
>                 intron level rather than having it grouped? The introns()
>                 function seems to
>                 be defunct now as it recommends that we use the
>                 intronsByTranscript()
>                 function.
>
>                 Thanks,
>
>                 Fong
>
>                          [[alternative HTML version deleted]]
>
>                 _________________________________________________
>                 Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         <mailto:Bioconductor at r-project.org
>         <mailto:Bioconductor at r-project.org>>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>                 Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
>