[BioC] Getting Introns Expression at a Per Gene Level

Valerie Obenchain vobencha at fhcrc.org
Wed Jan 30 21:23:46 CET 2013


Hi Fong,

We don't have a single function call to get introns by gene but you can 
put one together in a few steps.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

Specifying use.names = TRUE puts the transcript names on the GRangesList.

     introns <- intronsByTranscript(txdb, use.names=TRUE)

Unlist and remove duplicate introns.

     ulst <- unlist(introns)
     intronsNoDups <- ulst[!duplicated(ulst)]

The select() function can map txid to geneid (and others). See ?select.

     txnames <- names(intronsNoDups)
     map <- select(txdb, keys=txnames, keytype='TXNAME', cols='GENEID')

Not all transcripts are associated with a gene id and some are 
associated with more than one.

 > head(map, 10)
        TXNAME    GENEID
1  uc001aaa.3      <NA>
2  uc001aaa.3      <NA>
3  uc010nxq.1      <NA>
4  uc010nxq.1      <NA>
5  uc010nxr.1      <NA>
6  uc010nxr.1      <NA>
7  uc009vjk.2 100133331
8  uc009vjk.2 100133331
9  uc001aau.3 100132287
10 uc021oeh.1 100133331


To get a GRangesList of introns by gene, split the introns on the 
associated gene id.

     idx <- map$GENEID[!is.na(map$GENEID)]
     intronsByGene <- split(intronsNoDups[!is.na(map$GENEID)], idx)
     names(intronsByGene) <- unique(idx)


The list names are geneid and the element rownames are txid.

 > intronsByGene
GRangesList of length 19784:
$100133331
GRanges with 13 ranges and 0 metadata columns:
              seqnames               ranges strand
                 <Rle>            <IRanges>  <Rle>
   uc002qsd.4    chr19 [58858396, 58858718]      -
   uc002qsd.4    chr19 [58859007, 58861735]      -
   uc002qsd.4    chr19 [58862018, 58862756]      -
...

$100132287
GRanges with 1 range and 0 metadata columns:
              seqnames               ranges strand
   uc003wyw.1     chr8 [18248856, 18257507]      +


To count reads against this annotation you can use summarizeOverlaps(). 
The 'reads' can be Bam files, GappedAlignments or GappedAlignmentPairs. 
The 'features' argument will be the GRangesList of introns by gene. The 
result will be counts per gene.

To count just introns with no grouping you can still use 
summarizeOverlaps() but use the 'intronsNoDups' GRanges object as 
'features'.


Valerie


On 01/29/2013 11:38 AM, Fong Chun Chan wrote:
> Hi,
>
> I am using the R package, Genomic Features, to get Intron Expression and
> the function that I am using is the intronsByTranscript(). While this
> function is useful, it returns the number of raw reads that align to each
> intron grouped at the transcript level. Is there an easy way to get it to
> group it by a gene such that I am grabbing all the introns that fall in all
> the transcripts belong to a gene (and without double counting the intronic
> space).
>
> Similar, is there a way to easily get the raw read count at an individual
> intron level rather than having it grouped? The introns() function seems to
> be defunct now as it recommends that we use the intronsByTranscript()
> function.
>
> Thanks,
>
> Fong
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list