[BioC] How to plot NGS data?

Hahne, Florian florian.hahne at novartis.com
Mon Feb 20 18:49:26 CET 2012


Hi Steve,
Help is always appreciated. I will definitely let you know once my feeble
attempts are mature enough for public consumption. The main thing to get
right is to cleanly embed the file reference concept in the class
hierarchy, so that all RangedTrack objects know how to deal with
file-based information. The real work will probably be implementing all
the readers for the different file types (bigBed, bigWig, BAM, etc),
although most of this should be available from either the Rsamtools or
rtracklayer packages. With a generic input structure in place one could go
wild and even read coordinates and the associated data off a database
without changing a line of code in the downstream plotting methods.
Along these lines, having gene models stored in local TranscriptDb object
isn't such a bad option (after all it's how gene information is available
in Bioconductor these days), and there should be a clean way to directly
plot from those. 
You see, there are plenty of opportunities to improve the package, and
certainly enough work for all of us :-)
Will give you a shout once the undergarments are ready,
Florian


Florian Hahne
Novartis Institute For Biomedical Research
Translational Sciences / Preclinical Safety / PCS Informatics
Expert Data Integration and Modeling Bioinformatics
CHBS, WKL-135.2.26
Novartis Institute For Biomedical Research, Werk Klybeck
Klybeckstrasse 141
CH-4057 Basel
Switzerland
Phone: +41 61 6967127
Email : florian.hahne at novartis.com







On 2/17/12 5:50 PM, "Steve Lianoglou" <mailinglist.honeypot at gmail.com>
wrote:

>Hi Florian,
>
>On Fri, Feb 17, 2012 at 11:00 AM, Hahne, Florian
><florian.hahne at novartis.com> wrote:
>> Just to chime in here:
>> High up on my list of future developments is some sort of file-based
>>track
>> class, where all the genomic regions reside on disc in an indexed file,
>> like BAM, bigWig or tabix. The actual ranges are only realized within R
>>in
>> the plotting method, so no need to fill the memory with unnecessary
>> clatter. With the available infrastructure in Rsamtools this should be
>>an
>> easy extension, I just need to find some time to hack in the code. I
>>guess
>> some sort of NGS-specific visualization would be the next thing on the
>> list. There is an experimental AlignedReadsTrack class, but right now
>> that's really just a huge collection of bugs :-(
>
>I hacked a few "track-like" objects to use with GenomeGraphs some time
>ago in order to plot data from an rna-seq protocol we've been
>developing. The pictures look like this:
>
>http://cbio.mskcc.org/~lianos/files/bioconductor/DEPDC1.png
>
>"That's some bizarre RNA-seq data." you might say, but we only capture
>3' ends of mRNAs in order to study alternative cleavage and
>polyadenylation.
>
>To do that, though, those lanes (above the genome axis) are probably
>something like the AlignedReadsTrack class you mention, which are
>built by working over a specified range of a BAM files, or by
>Rle(coverage) vectors.
>
>These coverage vectors are also smoothed using another package I'm
>whipping up which does (probably not very efficiently written)
>convolutions over Rle(coverage) vectors directly, which might be
>useful:
>
>https://github.com/lianos/biosignals/blob/master/R/convolve1d.R
>
>I'd also like to add "stranded" visualization ability, ie - plot (+)
>coverage north of 0 and (-) south, like you've already implemented by
>the looks of vignette.
>
>Lastly, the tails of the gene models you see below are pulling the
>models out of some local cache that was build by the info cached from
>local TranscriptDb objects ... this needs to be redone to use
>something like tabix or BAM stored gene model-like functionality (mine
>currently isn't particularly efficient at all).
>
>All this is to say that I have some things whipped together that might
>be useful in this realm and would be happy to help w/ this, too ...
>I'd totally love to switch to Gviz eventually when there is time to do
>so, and would happy to help if you haven't already done it.
>
>In that regard, if you actually *want* help with that, maybe you could
>ping (maybe on bioc-devel?) us when you think you might be ready to
>start tackling this part of Gviz?
>
>Anyway -- although I haven't used it, Gviz looks incredibly
>impressive, nice work!
>
>-steve
>
>-- 
>Steve Lianoglou
>Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
>Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list