[BioC] Question about TranscriptDb and makeTranscriptDb method

Hervé Pagès hpages at fhcrc.org
Sun Dec 12 02:54:46 CET 2010


Hi Song,

On 12/10/2010 11:16 AM, Song Li wrote:
> Hi, All,
>
> I want to thank you for the incredible package which greatly
> simplifies our analysis for RNA-seq.
>
> However, I am working with Arabidopsis RNA-seq data, however, it seems
> that I have to build a transcriptDb object by myself.  Is there a
> function that reads GTF file and make transcriptDB object?

No we don't have this yet but we might add it in the future.
In the mean time you can build a TranscriptDb object for
Arabidopsis by using the alyrata_eg_gene dataset from the
plant_mart_7 Mart:

 > library(GenomicFeatures)

 > txdb <- makeTranscriptDbFromBiomart("plant_mart_7", "alyrata_eg_gene")
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TranscriptDb object ... OK
Warning messages:
1: In .normargSplicings(splicings, unique_tx_ids) :
   no CDS information for this TranscriptDb object
2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, 
splicings$exon_chrom) :
   chromosome lengths and circularity flags are not available for this 
TranscriptDb object

 > txdb
TranscriptDb object:
| Db type: TranscriptDb
| Data source: BioMart
| BioMart database: plant_mart_7
| BioMart database version: ENSEMBL PLANT 7 (EBI UK)
| BioMart dataset: alyrata_eg_gene
| BioMart dataset description: Arabidopsis lyrata genes (Araly1)
| BioMart dataset version: Araly1
| Full dataset: yes
| transcript_nrow: 32667
| exon_nrow: 174271
| cds_nrow: 0
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2010-12-11 17:43:13 -0800 (Sat, 11 Dec 2010)
| GenomicFeatures version at creation time: 1.2.3
| RSQLite version at creation time: 0.9-4
| DBSCHEMAVERSION: 1.0

Just a reminder though that if you decide to use this then it's
*crucial* that you align your RNA-seq data against the reference
genome that corresponds to those annotations (I'm not sure which
one it is, you'll need to investigate).

Cheers,
H.

>
> Thanks,
> Song Li


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list