[BioC] retrieving annotation
Nicolas Delhomme
delhomme at embl.de
Sat Nov 16 23:39:13 CET 2013
Hej Kathi!
In a different thread (GTF file error when using easyRNAseq), Martin mentioned that you can access ensemble gff files through AnnotationHub. I just copy part of this answer below and as you can see, the gene_biotype is part of the annotation:
> library(AnnotationHub)
> hub = AnnotationHub()
> hub$ensembl.release.73.<tab>
hub$ensembl.release.73.fasta. ... [378]
hub$ensembl.release.73.gtf. ... [63]
> xx = hub$ensembl.release.73.gtf.gallus_gallus.Gallus_gallus.Galgal4.73.gtf_0.0.1.RData
> xx
GRanges with 381368 ranges and 12 metadata columns:
seqnames ranges strand | source type
<Rle> <IRanges> <Rle> | <factor> <factor>
[1] 1 [1735, 2449] + | protein_coding exon
[2] 1 [2379, 2449] + | protein_coding CDS
score phase gene_id transcript_id
<numeric> <integer> <character> <character>
[1] <NA> <NA> ENSGALG00000009771 ENSGALT00000015891
[2] <NA> 0 ENSGALG00000009771 ENSGALT00000015891
exon_number gene_biotype exon_id protein_id
<numeric> <character> <character> <character>
[1] 1 protein_coding ENSGALE00000301221 <NA>
[2] 1 protein_coding <NA> ENSGALP00000015874
gene_name transcript_name
<character> <character>
[1] <NA> <NA>
[2] <NA> <NA>
[ reached getOption("max.print") -- omitted 9 rows ]
---
seqlengths:
1 2 ... AADN03010940.1
NA NA … NA
Hope this helps,
Cheers,
Nico
---------------------------------------------------------------
Nicolas Delhomme
Genome Biology Computational Support
European Molecular Biology Laboratory
Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------
On 7 Nov 2013, at 14:11, Kathi Zarnack <zarnack at ebi.ac.uk> wrote:
> Hi,
>
> I wanted to ask whether any of the annotation packages contains information on the transcript biotype (protein-coding, etc). I would like to select only protein-coding isoforms from Ensembl annotation, but I could not find any package that includes this information (otherwise I will get it with biomaRt, I just wondered whether it is already included somewhere).
>
> Also, I tried to download GENCODE annotation using GenomicFeatures, and got the following error:
>
> > test=makeTranscriptDbFromUCSC(genome="hg19", tablename="wgEncodeGencodeManualV3")
> Error in tableNames(ucscTableQuery(session, track = track)) :
> error in evaluating the argument 'object' in selecting a method for function 'tableNames': Error in normArgTrack(track, trackids) : Unknown track: Gencode Genes
>
> I tried to get the same table for hg18, but I get only one step further:
>
> test=makeTranscriptDbFromUCSC(genome="hg18", tablename="wgEncodeGencodeManualV3")
> Download the wgEncodeGencodeManualV3 table ... OK
> Download the wgEncodeGencodeClassesV3 table ... Error in normArgTable(value, x) :
> unknown table name 'wgEncodeGencodeClassesV3'
>
> Thank you very much for your help,
> Kathi
>
>
> ------------------------------------------
>
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0
> [4] GenomicRanges_1.14.3 XVector_0.2.0 IRanges_1.20.5
> [7] BiocGenerics_0.8.0 BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.18.0 Biostrings_2.30.0 bitops_1.0-6 BSgenome_1.30.0
> [5] DBI_0.2-7 RCurl_1.95-4.1 Rsamtools_1.14.1 RSQLite_0.11.4
> [9] rtracklayer_1.22.0 stats4_3.0.2 tcltk_3.0.2 tools_3.0.2
> [13] XML_3.98-1.1 zlibbioc_1.8.0
>
>
> --
> Dr. Kathi Zarnack
> Luscombe Group
>
> European Molecular Biology Laboratory
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> United Kingdom
>
> emailzarnack at ebi.ac.uk
> tel +44 1223 494 526
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list