[BioC] rtracklayer import.gff3 mangling scores

Tim Rayner tfrayner at gmail.com
Mon Jul 16 15:25:43 CEST 2012


Hi,

I've just run into what I think is a bug in the rtracklayer
import.gff3 function (v1.16.1). If I import a GFF3 containing scores
while stringsAsFactors=TRUE, the resulting scores are mangled. I
haven't confirmed it, but I suspect the values are being converted to
a factor upon import and then coerced to numeric (giving the factor
level, not the original value). If I use
options(stringsAsFactors=FALSE) the values remain intact.

Best regards,

Tim Rayner

-- 
Bioinformatician
Smith Lab, CIMR
University of Cambridge
United Kingdom



Example GFF3 content:

##gff-version 3
##date 2012-07-13
chr1    rtracklayer     snp        189807684       189807684
0.20294398632582        *       .       ID=rs955894;name=rs955894
chr1    rtracklayer     snp        198484784       198484784
0.269327708380075       *       .       ID=rs16843226;name=rs16843226
chr1    rtracklayer     snp        237405093       237405093
0.379417274542624       *       .       ID=rs679735;name=rs679735
chr1    rtracklayer     snp        80235819        80235819
0.418346673826376       *       .       ID=rs12022561;name=rs12022561
chr1    rtracklayer     snp        84875173        84875173
0.302119655250906       *       .       ID=rs6576700;name=rs6576700
chr1    rtracklayer     snp        112793146       112793146
0.390270490589027       *       .       ID=rs11102440;name=rs11102440
chr1    rtracklayer     snp        244187847       244187847
0.249206080122631       *       .       ID=rs1000451;name=rs1000451
chr1    rtracklayer     snp        8612104 8612104 0.583436890885292
    *       .       ID=rs6577499;name=rs6577499


> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rtracklayer_1.16.1  GenomicRanges_1.8.6 IRanges_1.14.3
[4] BiocGenerics_0.2.0

loaded via a namespace (and not attached):
[1] Biostrings_2.24.1 bitops_1.0-4.1    BSgenome_1.24.0   RCurl_1.91-1
[5] Rsamtools_1.8.5   stats4_2.15.1     tools_2.15.1      XML_3.9-4
[9] zlibbioc_1.2.0



More information about the Bioconductor mailing list