[BioC] miRNA annotation

Mon May 7 20:59:20 CEST 2012

Hi,

On Mon, May 7, 2012 at 2:15 PM, Wu, Huiyun <william.wu at vanderbilt.edu> wrote:
> Hello there,
>
> I am new in seq data analysis, and have difficulty in miRNA annotation. To clarify my question, I have the following codes.
>
>
>   bamfile <- "GSE33858_1.bowtie_alignment.bam"
>   aln <- readBamGappedAlignments(bamfile)
>   txdb<-makeTranscriptDbFromUCSC(genome="hg19",tablename="knownGene")
>   exonRangesList <- exonsBy(txdb, by = "gene")
>   exonRanges <- unlist(exonRangesList)
>   strand(exonRanges) <- "*"
>   geneNames <- sub("\\..*$", "", names(exonRanges))
>   exonRangesListNoStrand <- split(exonRanges, geneNames)
>   numberOfexons <- sapply(width(exonRangesListNoStrand), length)
>   geneLength <- sum(width(reduce(exonRangesListNoStrand)))
>   # Counting reads for the BAM file
>   counts <- countOverlaps(exonRangesListNoStrand, aln)
>   names(counts) <- names(exonRangesListNoStrand)
>
>   > length(counts)
>
>       [1] 22932
>
> So my question is from aligned file to annotated file. Specifically, why I got 22932 features after mapping to hg19? I learned there are only less than 2000 miRNAs matched genes. I also know there are several RNA databases including miRBase for annotation. Did I do something logically wrong? It would be greatly appreciated if someone could show me how to implement this annotation in R.

What does `sum(counts == 0)` give you?

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact