[BioC] R package to estimate aggregation of NGS reads

Mon Sep 29 18:20:45 CEST 2008

Hi Ana --

These are all in the development branch of Bioconductor.

ShortRead will read Solexa, MAQ binary (not yet 0.7.0), MAQ text, 454,
fastq, and 'columns' of sequence data into appropriate data
structures. ShortRead::pileup generates a MAQ-style pile-up vector;
the ShortRead Overview vignette might be a place to start.

Biostrings has functions 'coverage' and 'slice' which calculate a
pile-up like vector and then slice it at a certain height.

Together this gives relatively naive tools for finding aggregations of
mapped reads; probably the statistician in you wants to address (using
R, of course) important questions about, e.g., accommodating different
total numbers of reads per lane, uncertainty in base calls,
information from + and - strands, etc.

There's also the bioc-sig-sequencing mailing list.

Hope that helps.

Martin

Ana Conesa <aconesa at cipf.es> writes:

> Dear list,
>
> I was wondering if anyone knows a R package or function to investigate
> distribution of next-generation sequencing mapping data on chromosomes
> and find regions with a significant aggregation of mapped reads.
> Cheers
>
> Ana
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793