[BioC] retrieving annotation

Hervé Pagès hpages at fhcrc.org
Sun Nov 17 22:43:40 CET 2013


Hi Kathi,

On 11/07/2013 05:11 AM, Kathi Zarnack wrote:
> Hi,
>
> I wanted to ask whether any of the annotation packages contains
> information on the transcript biotype (protein-coding, etc). I would
> like to select only protein-coding isoforms from Ensembl annotation, but
> I could not find any package that includes this information (otherwise I
> will get it with biomaRt, I just wondered whether it is already included
> somewhere).
>
> Also, I tried to download GENCODE annotation using GenomicFeatures, and
> got the following error:
>
>  > test=makeTranscriptDbFromUCSC(genome="hg19",
> tablename="wgEncodeGencodeManualV3")
> Error in tableNames(ucscTableQuery(session, track = track)) :
>    error in evaluating the argument 'object' in selecting a method for
> function 'tableNames': Error in normArgTrack(track, trackids) : Unknown
> track: Gencode Genes
>
> I tried to get the same table for hg18, but I get only one step further:
>
> test=makeTranscriptDbFromUCSC(genome="hg18",
> tablename="wgEncodeGencodeManualV3")
> Download the wgEncodeGencodeManualV3 table ... OK
> Download the wgEncodeGencodeClassesV3 table ... Error in
> normArgTable(value, x) :
>    unknown table name 'wgEncodeGencodeClassesV3'

Note that the wgEncodeGencodeManualV3 table seems to be for hg18
only: there doesn't seem to be such table for hg19.

For hg19, UCSC provides 3 GENCODE tracks: GENCODE Genes V17, GENCODE
Genes V14, and GENCODE Genes V7. Each of them contains 5 tables
that are compatible with makeTranscriptDbFromUCSC(). For example,
for GENCODE Genes V17, those tables are:

   wgEncodeGencodeBasicV17
   wgEncodeGencodeCompV17
   wgEncodeGencodePseudoGeneV17
   wgEncodeGencode2wayConsPseudoV17
   wgEncodeGencodePolyaV17

See here for the details:

   http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeGencodeSuper

I just made some adjustments to the GenomicFeatures package so
makeTranscriptDbFromUCSC() can work on those tables. Unfortunately
I also needed to fix support for the wgEncodeGencode*V3 tables (for
hg18) which was broken due to changes on the UCSC side.

Those updates are in GenomicFeatures 1.14.2 (release) and 1.15.4
(devel). Both should become available via biocLite() in the next 24
hours or so.

Please let us know if you run into any other problem with the
GenomicFeatures package.

Thanks,
H.


>
> Thank you very much for your help,
> Kathi
>
>
> ------------------------------------------
>
>  > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>   [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>   [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets methods
> [8] base
>
> other attached packages:
> [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0
> [4] GenomicRanges_1.14.3   XVector_0.2.0 IRanges_1.20.5
> [7] BiocGenerics_0.8.0     BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
>   [1] biomaRt_2.18.0     Biostrings_2.30.0  bitops_1.0-6 BSgenome_1.30.0
>   [5] DBI_0.2-7          RCurl_1.95-4.1     Rsamtools_1.14.1 RSQLite_0.11.4
>   [9] rtracklayer_1.22.0 stats4_3.0.2       tcltk_3.0.2 tools_3.0.2
> [13] XML_3.98-1.1       zlibbioc_1.8.0
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list