[BioC] Finding my un-counted reads

Fri Aug 30 19:16:02 CEST 2013

Hi,

On Fri, Aug 30, 2013 at 9:38 AM, Sam McInturf <smcinturf at gmail.com> wrote:
> Valarie,
>
> By calling:
> olap <- findOverlaps(gff0, reads)
> foo <- countSubjectHits(olap)
>
> `countSubjectHits(x): Counts the number of hits for each subject, returning
> an integer vector.`
> foo is a vector with a length equal to the number of reads mapped? so:
>
> length(foo[foo == 0])
>
> would tell me how many reads did not map to my annotated features.
>
> can you recommend a method that would  allow me to map to the regions
> outside of my current features?  I have been trying to devise a way to make
> a new set of features that are the spaces between my current features, but
> I have been unable to implement it successfully.
>
> Maybe a more genomics oriented approach and look at the coverage over each
> chromosome to try and identify regions that have a higher 'background' that
> could be then looked at in a genome browser like IGV to see the reads that
> did not map to a transcript?

I had previously offered a suggestion, like so:

    tx <- transcripts(txdb)
    strand(tx) <- '*';
    txs <- reduce(tx)

    o <- countOverlaps(txs, reads, ignore.strand=TRUE)

But I was missing a crucial step, which was to use the `gaps` function, eg:

    tx <- transcripts(txdb)
    strand(tx) <- '*';
    txs <- reduce(tx)
    nontx <- gaps(txs)

    o <- countOverlaps(nontx, reads, ignore.strand=TRUE)

`nontx` should be a GenomicRanges object with ranges that are created
from the spaces between the transcripts (the "gaps") -- which should
be a good approximation to the intergenic regions you are looking for.

-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech