[BioC] read coverage per transcript for RNASeq

Martin Morgan mtmorgan at fhcrc.org
Wed Sep 2 01:35:38 CEST 2009


Hi Jessica -- mixing up the order a bit...

> Also, while I am writing, I was interested in only looking at reads
> above a certain quality score.  When I input the data with
> readAligned, I couldn’t figure out how to filter on this variable,
> so I ended up reading the data just into R, filtering that way, and
> then reading the data into ShortRead.  Is there a better way to do
> this?  All the transferring of the data back and forth took a lot of
> time to process.

readAligned takes a filter argument, see ?srFilter (maybe
alignQualityFilter ? 'quality' is a bit ambiguous) and the final
example on ?readAligned. Another route is along the lines of

  aln[ quality(alignQuality(aln)) > certainQualityScore ]

> I have short read sequences for RNASeq that I have then aligned to
> the reference transcript sequences.  The reads are from a Solexa
> platform and they were aligned using MAQ.  I have imported the data
> using ShortReads (readAligned) and all looks good.  However, I am
> interested in limiting the data to those transcripts with at least
> 85% coverage.  I can’t find any documentation to do this that
> doesn’t involved looking at each transcript individually.  Any way
> to do this for all transcripts (30,000+)?

Hmm. You say they're aligned to the reference transcript sequences, so
it sounds like chromosome(aln) gives you the transcript each is
aligned to and

  cvg <- coverage(aln)

gives you a list of 'run-length-encoded' coverage vectors, one for
each transcript. I'm not sure what '85% coverage' means, but maybe
you're after something like  

  sum(cvg) > .85 * sapply(cvg, length)

A little more detail might lead to a clearer solution. The
bioc-sig-sequencing news groups

  https://stat.ethz.ch/pipermail/bioc-sig-sequencing/

has several posts recently that might be helpful.

Hope that helps,

Martin


"Jessica Hunter" <hunter at ohsu.edu> writes:

> Hello all.
>
>  
>
> I have short read sequences for RNASeq that I have then aligned to
> the reference transcript sequences.  The reads are from a Solexa
> platform and they were aligned using MAQ.  I have imported the data
> using ShortReads (readAligned) and all looks good.  However, I am
> interested in limiting the data to those transcripts with at least
> 85% coverage.  I can’t find any documentation to do this that
> doesn’t involved looking at each transcript individually.  Any way
> to do this for all transcripts (30,000+)?
>
>  
>
>
>  
>
> Lastly, any good online resources for RNASeq applications with
> Bioconductor that people can recommend?
>
>  
>
> Thank you,
>
>  
>
> Jessica
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list