[BioC] Question about TranscriptDb and makeTranscriptDb method

Mon Dec 13 18:19:37 CET 2010

Hi Song,

The message about CDS availability refers just to the ranges needed to
populate the CDS tables.  However, if you are like a lot of people you
will only be asking questions about transcripts and exons, and in that
case, I bet that this will not affect you.

  Marc

On 12/13/2010 07:00 AM, Song Li wrote:
> Hi Hervé ,
>
> Thank you for the reply.
>
> I am little worried about the warning message that "CDS" is not
> available.  However, it does not seem to be a crucial factor to
> consider at this moment.
>
> Best,
> Song
>
> 2010/12/11 Hervé Pagès <hpages at fhcrc.org>:
>   
>> Hi Song,
>>
>> On 12/10/2010 11:16 AM, Song Li wrote:
>>     
>>> Hi, All,
>>>
>>> I want to thank you for the incredible package which greatly
>>> simplifies our analysis for RNA-seq.
>>>
>>> However, I am working with Arabidopsis RNA-seq data, however, it seems
>>> that I have to build a transcriptDb object by myself.  Is there a
>>> function that reads GTF file and make transcriptDB object?
>>>       
>> No we don't have this yet but we might add it in the future.
>> In the mean time you can build a TranscriptDb object for
>> Arabidopsis by using the alyrata_eg_gene dataset from the
>> plant_mart_7 Mart:
>>
>>     
>>> library(GenomicFeatures)
>>>       
>>     
>>> txdb <- makeTranscriptDbFromBiomart("plant_mart_7", "alyrata_eg_gene")
>>>       
>> Download and preprocess the 'transcripts' data frame ... OK
>> Download and preprocess the 'splicings' data frame ... OK
>> Download and preprocess the 'genes' data frame ... OK
>> Prepare the 'metadata' data frame ... OK
>> Make the TranscriptDb object ... OK
>> Warning messages:
>> 1: In .normargSplicings(splicings, unique_tx_ids) :
>>  no CDS information for this TranscriptDb object
>> 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom,
>> splicings$exon_chrom) :
>>  chromosome lengths and circularity flags are not available for this
>> TranscriptDb object
>>
>>     
>>> txdb
>>>       
>> TranscriptDb object:
>> | Db type: TranscriptDb
>> | Data source: BioMart
>> | BioMart database: plant_mart_7
>> | BioMart database version: ENSEMBL PLANT 7 (EBI UK)
>> | BioMart dataset: alyrata_eg_gene
>> | BioMart dataset description: Arabidopsis lyrata genes (Araly1)
>> | BioMart dataset version: Araly1
>> | Full dataset: yes
>> | transcript_nrow: 32667
>> | exon_nrow: 174271
>> | cds_nrow: 0
>> | Db created by: GenomicFeatures package from Bioconductor
>> | Creation time: 2010-12-11 17:43:13 -0800 (Sat, 11 Dec 2010)
>> | GenomicFeatures version at creation time: 1.2.3
>> | RSQLite version at creation time: 0.9-4
>> | DBSCHEMAVERSION: 1.0
>>
>> Just a reminder though that if you decide to use this then it's
>> *crucial* that you align your RNA-seq data against the reference
>> genome that corresponds to those annotations (I'm not sure which
>> one it is, you'll need to investigate).
>>
>> Cheers,
>> H.
>>
>>     
>>> Thanks,
>>> Song Li
>>>       
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>>     
>
>
>