[BioC] zero-row result breaks select() on PolyPhen.Hsapiens.* and SIFT.Hsapiens.*

Robert Castelo robert.castelo at upf.edu
Tue Sep 24 23:49:03 CEST 2013


This is great news, thanks Valerie!!

best regards,
robert.

On 9/24/13 9:17 PM, Valerie Obenchain wrote:
> I will update the SIFT and PolyPhen databases for the upcoming release.
>
> Valerie
>
>
> On 09/23/2013 02:21 PM, Robert Castelo wrote:
>> hi Valerie,
>>
>> On 9/23/13 9:41 PM, Valerie Obenchain wrote:
>>> Hi Robert,
>>>
>>> Thanks for reporting this. Now fixed in VariantAnnotation 1.7.47.
>>>
>> great! thanks for the quick fix.
>>
>>> Have you looked at the ensemblVEP package? It's a wrapper to Ensembl's
>>> Variant Effect Predictor tool. We encourage the use of ensemblVEP
>>> instead of the SIFT and PolyPhen databases because it accesses the
>>> most current information. As you know, the SIFT and PolyPhen dbs are
>>> becoming dated and we don't have plans to package newer versions.
>>>
>>> emsemblVEP requires that you download and install the script located
>>> here,
>>>
>>> http://uswest.ensembl.org/info/docs/tools/vep/script/vep_download.html
>>>
>>> The variant_effect_predictor.pl executable must be in your path. Let
>>> us know if you have trouble with the install/setup.
>> yes, i looked at it, and i think it is a great solution for analysis of
>> a few hundred variants as it needs to acces the internet to download the
>> information. However, i'm working on a package that eventually needs to
>> annotate a few thousand variants and i find the dependency on an
>> external perl script that the end user must install, somewhat troubling.
>> let me know if you have suggestions about this.
>>
>> for software packages that need to efficiently access SIFT and PolyPhen
>> annotations from R, freezing the data regularly is, in my opinion, a
>> much better solution. i was actually going to ask you if you could
>> update these two packages. As much as you want to keep an up to date
>> version of the SNPloc.Hsapiens.* or TxDb.* packages, i'd do it for SIFT
>> and Polyphen, unless there's some licensing issue that prevents this, as
>> it happens now with OMIM.
>>
>> cheers,
>> robert.
>>
>>> Valerie
>>>
>>> On 09/20/2013 05:25 PM, Robert Castelo wrote:
>>>> Dear list,
>>>>
>>>> interrogating the TxDb.Hsapiens.UCSC.hg19.knownGene package with no
>>>> result gives the following expected result:
>>>>
>>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>> select(TxDb.Hsapiens.UCSC.hg19.knownGene, keys="dummy",
>>>> keytype="GENEID", cols="SYMBOL")
>>>> [1] GENEID
>>>> <0 rows> (or 0-length row.names)
>>>>
>>>> however, when i try the same with the annotation packages
>>>> PolyPhen.Hsapiens.dbSNP131 and SIFT.Hsapiens.dbSNP132, the select
>>>> instruction breaks into an error:
>>>>
>>>> library(SIFT.Hsapiens.dbSNP132)
>>>> library(PolyPhen.Hsapiens.dbSNP131)
>>>>
>>>> select(SIFT.Hsapiens.dbSNP132, keys=c("dummy"))
>>>> Error in data.frame(RSID = unlist(rsid), PROTEINID =
>>>> unlist(protein_id),  :
>>>>    arguments imply differing number of rows: 1, 0
>>>>
>>>> select(PolyPhen.Hsapiens.dbSNP131, keys="dummy")
>>>> Error in `*tmp*`$RSID : $ operator is invalid for atomic vectors
>>>>
>>>> i guess these two annotation packages should work analogously to
>>>> TxDb.Hsapiens.UCSC.hg19.knownGene, and give just a 0-row data.frame
>>>> object, right?
>>>>
>>>> these errors reproduce also with the current devel version of BioC,
>>>> please find below both sessionInfo() outputs.
>>>>
>>>> cheers,
>>>> robert.
>>>>
>>>> =====RELEASE====
>>>> R version 3.0.1 (2013-05-16)
>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] parallel  stats     graphics  grDevices utils datasets
>>>> methods base
>>>>
>>>> other attached packages:
>>>>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2 GenomicFeatures_1.12.3
>>>>   [3] AnnotationDbi_1.22.6 Biobase_2.20.1
>>>>   [5] PolyPhen.Hsapiens.dbSNP131_1.0.2 SIFT.Hsapiens.dbSNP132_1.0.2
>>>>   [7] RSQLite_0.11.4 DBI_0.2-7
>>>>   [9] VariantAnnotation_1.6.7 Rsamtools_1.12.4
>>>> [11] Biostrings_2.28.0 GenomicRanges_1.12.5
>>>> [13] IRanges_1.18.3 BiocGenerics_0.6.0
>>>> [15] vimcom_0.9-8 setwidth_1.0-3
>>>> [17] colorout_1.0-0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] biomaRt_2.16.0     bitops_1.0-6       BSgenome_1.28.0
>>>> RCurl_1.95-4.1     rtracklayer_1.20.4
>>>> [6] stats4_3.0.1       tools_3.0.1        XML_3.95-0.2 zlibbioc_1.6.0
>>>>
>>>>
>>>>
>>>> =====DEVEL=====
>>>> R version 3.0.1 (2013-05-16)
>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] parallel  stats     graphics  grDevices utils datasets
>>>> methods base
>>>>
>>>> other attached packages:
>>>>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2 GenomicFeatures_1.13.40
>>>>   [3] AnnotationDbi_1.23.23 Biobase_2.21.7
>>>>   [5] PolyPhen.Hsapiens.dbSNP131_1.0.2 SIFT.Hsapiens.dbSNP132_1.0.2
>>>>   [7] RSQLite_0.11.4 DBI_0.2-7
>>>>   [9] VariantAnnotation_1.7.46 Rsamtools_1.13.41
>>>> [11] Biostrings_2.29.19 GenomicRanges_1.13.44
>>>> [13] XVector_0.1.4 IRanges_1.19.37
>>>> [15] BiocGenerics_0.7.5 vimcom_0.9-8
>>>> [17] setwidth_1.0-3 colorout_1.0-0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] biomaRt_2.17.3      bitops_1.0-6        BSgenome_1.29.1
>>>> RCurl_1.95-4.1      rtracklayer_1.21.12
>>>> [6] stats4_3.0.1        tools_3.0.1         XML_3.95-0.2 
>>>> zlibbioc_1.7.0
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>



More information about the Bioconductor mailing list