[R] Reading gz compressed csv file - 'incomplete line found'

Paolo Innocenti innocenti.paolo at gmail.com
Fri Jan 21 06:07:51 CET 2011


That worked!

download.file(myurl, destfile=myfile, mode="wb")

Thanks a lot,
paolo

On 01/21/2011 02:53 PM, William Dunlap wrote:
> Try mode="wb" ('b' for binary mode) in the
> call to download.file().  It should make a
> difference on Windows (&  Mac?) and be innocuous on
> Unix.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Paolo Innocenti
>> Sent: Thursday, January 20, 2011 4:39 PM
>> To: r-help at r-project.org
>> Subject: [R] Reading gz compressed csv file - 'incomplete line found'
>>
>> Hi all,
>>
>> I am trying to download, decompress and read a csv file. My code:
>>
>> myurl<-
>> "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE2
>> 4729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz"
>>
>> #
>> myfile<- "GSE24729_MitoNuclear_suppl_male_stats.csv.gz"
>> #
>> download.file(myurl, destfile=myfile, mode="w")
>> #
>> mycon<- gzcon(gzfile(myfile, open="r"))
>> #
>> mydata<- read.csv(textConnection(readLines(mycon)))
>> #
>> close(mycon)
>>
>> works under my linux distribution, but under windows, I get the
>> following warning:
>>
>>   >  myurl<-
>> "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE2
>> 4729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz"
>>
>>   >  myfile<- "GSE24729_MitoNuclear_suppl_male_stats.csv.gz"
>>   >  download.file(myurl, destfile=myfile, mode="w")
>> trying URL
>> 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE2
>> 4729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz'
>>
>> ftp data connection made, file length 535641 bytes
>> opened URL
>> downloaded 523 Kb
>>
>>   >  mycon<- gzcon(gzfile(myfile, open="r"))
>>   >  mydata<- read.csv(textConnection(readLines(mycon)))
>> Warning message:
>> In readLines(mycon) :
>>     incomplete final line found on
>> 'gzcon(GSE24729_MitoNuclear_suppl_male_stats.csv.gz)'
>>   >  close(mycon)
>>
>> I can read only 30 lines, and then stops working. Does anyone
>> have any
>> suggestion? I suspect the problem lies in gzcon/gzfile not
>> decompressing
>> properly, or in some other problem with the end of line/end
>> of file, but
>> the help files are a bit above my level of understanding.
>>
>> Thanks,
>> paolo
>>
>>   >  sessionInfo()
>> R version 2.12.1 (2010-12-16)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] grid      stats     graphics  grDevices utils
>> datasets  methods
>> [8] base
>>
>> other attached packages:
>>    [1] lattice_0.19-13      drosophila2.db_2.4.5 org.Dm.eg.db_2.4.6
>>    [4] GOstats_2.16.0       RSQLite_0.9-4        DBI_0.2-5
>>    [7] graph_1.28.0         Category_2.16.0      AnnotationDbi_1.12.0
>> [10] xtable_1.5-6         GEOquery_2.16.3      ellipse_0.3-5
>> [13] RColorBrewer_1.0-2   hopach_2.10.0        cluster_1.13.2
>> [16] limma_3.6.9          genefilter_1.32.0    vsn_3.18.0
>> [19] affy_1.28.0          Biobase_2.10.0
>>
>> loaded via a namespace (and not attached):
>>    [1] affyio_1.18.0         annotate_1.28.0       GO.db_2.4.5
>>    [4] GSEABase_1.12.2       preprocessCore_1.12.0 RBGL_1.26.0
>>    [7] RCurl_1.5-0.1         splines_2.12.1        survival_2.36-2
>> [10] tools_2.12.1          XML_3.2-0.2
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list