[BioC] Reading GTFs

Sean Davis sdavis2 at mail.nih.gov
Wed Jun 4 11:30:59 CEST 2008


On Tue, Jun 3, 2008 at 11:20 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Howdy,
>
> On Jun 3, 2008, at 7:06 PM, Sean Davis wrote:
>
>> On Tue, Jun 3, 2008 at 6:24 PM, Steve Lianoglou
>> <mailinglist.honeypot at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I'm wondering why I can't seem to stumble across any packages that deal
>>> with
>>> parsing and mapping GTF annotation data.
>>
>> GTF is just tab-delimited text, yes?  read.table() should eat that up.
>> You could also look at biomaRt, rtracklayer, and GenomeGraph
>> packages.
>
> Yes, they are just tab delimited and quite easy to read in w/ R's ability to
> slice and dice delimited text. I was just wondering if I was doing this "my
> own way" instead of taking advantage of something that's already there ...
> meaning, all this great work has gone into R/Bioconductor that allows it to
> lay claim to the "batteries included" type motto. It's just that I feel like
> at times the batteries are somewhere on the top shelf and easy to miss :-)
>
> With packages that are concerned with setting up meta data for chip
> information, and probe mappings/whatever, I was just wondering if I should
> be attacking the problem with a certain bent that would be able to be used
> again in some already existing framework is all.
>
> That said, thanks for the pointers to your suggested packages, and I'll look
> through them more.
>
>>> In order to do some analysis with tiling array data, I need to
>>> incorporate
>>> annotation data for chromosome positions
>>
>> You might look at the tilingArray package.
>
> Yeah ... I've been in and out of that package. It's handy to learn from, for
> sure, and I'm trying to reuse as much of it as possible.
>
>>> I'm happy to whip up some rigged method of doing this myself, but I feel
>>> like others must be doing the same thing and I'm reinventing the wheel
>>> which
>>> might not be all that round by the time I'm done with it.
>>>
>>> Are there better ways to deal with genome annotation? I mentioned the
>>> AnnotationDbi in the subject line, because I feel like it provides
>>> something
>>> similar, but I don't think it's quite what I'm after.
>>
>> What do you actually want to do?  The specifics may be relevant.
>
> Currently I'm trying to gather a set of probes that fit a certain set of
> criteria, such as their genomic annotation (intergenic vs exonic, etc),
> number of hits to its genome, etc. I have all the information for these from
> a combination of reblasting the probes to the genome (as suggested by W.
> Huber and others) and the GTF file  and trying to store this information in
> a similar env that the tilingArray and Ringo packages use.
>
> Later I'll probably want to go the other way by having a set of interesting
> probes and ensuring a quick way I can get the pertinent information for them
> to send them through some other bioconductor functionality, like one of the
> go* packages (for example).

Steve,

If you want to set up this kinda thing, I would suggest sticking with
RSQLite rather than environments.  If you have tables of blast results
and tables of GTF annotation, you could load those directly now and do
queries to get the probes of interest on the fly, as a simple example.

Sean



More information about the Bioconductor mailing list