[Rd] download.file does not process gz files correctly (truncates them?)

Joris Meys jori@mey@ @ending from gm@il@com
Wed May 2 21:21:47 CEST 2018


Dear all,

I've noticed by trying to download gz files from here :
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907811

At the bottom one can download GSM907811.CEL.gz . If I download this
manually and try

oligo::read.celfiles("GSM907811.CEL.gz")

everything works fine. (oligo is a bioConductor package)

However, if I download using

download.file("
https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM907811&format=file&file=GSM907811%2ECEL%2Egz
",
              destfile = "GSM907811.CEL.gz")

The file is downloaded, but oligo::read.celfiles() returns the following
error:

Error in checkChipTypes(filenames, verbose, "affymetrix", TRUE) :
  End of gz file reached unexpectedly. Perhaps this file is truncated.

Moreover, if I try to delete it after using download.file(), I get a
warning that permission is denied. I can only remove it using Windows file
explorer after I closed the R session, indicating that the connection is
still open. Yet, showConnections() doesn't show any open connections either.

Session info below. Note that I started from a completely fresh R session.
oligo is needed due to the specific file format of these gz files. They're
not standard tarred files.

Cheers
Joris

Session Info
-------------------------------------------------------------------------------------

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
methods
[9] base

other attached packages:
 [1] pd.hugene.1.0.st.v1_3.14.1 DBI_0.8
oligo_1.44.0
 [4] Biobase_2.39.2             oligoClasses_1.42.0
RSQLite_2.1.0
 [7] Biostrings_2.48.0          XVector_0.19.9
IRanges_2.13.28
[10] S4Vectors_0.17.42          BiocGenerics_0.25.3

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16                compiler_3.5.0
 [3] BiocInstaller_1.30.0        GenomeInfoDb_1.15.5
 [5] bitops_1.0-6                iterators_1.0.9
 [7] tools_3.5.0                 zlibbioc_1.25.0
 [9] digest_0.6.15               bit_1.1-12
[11] memoise_1.1.0               preprocessCore_1.41.0
[13] lattice_0.20-35             ff_2.2-13
[15] pkgconfig_2.0.1             Matrix_1.2-14
[17] foreach_1.4.4               DelayedArray_0.5.31
[19] yaml_2.1.18                 GenomeInfoDbData_1.1.0
[21] affxparser_1.52.0           bit64_0.9-7
[23] grid_3.5.0                  BiocParallel_1.13.3
[25] blob_1.1.1                  codetools_0.2-15
[27] matrixStats_0.53.1          GenomicRanges_1.31.23
[29] splines_3.5.0               SummarizedExperiment_1.9.17
[31] RCurl_1.95-4.10             affyio_1.49.2


-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]




More information about the R-devel mailing list