[R] How to download and unzip data in a loop

Alexandra Catena amc5981 at gmail.com
Thu Feb 5 19:03:34 CET 2015


Thank you guys for the response.

I'm trying to download the last ten years of meteorology data from a
weather station in Livermore from the URL:
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2015/724927-23285-2015.gz
The Livermore station code is 724927-23285.  If I wanted to download data
from 2005, the URL would be:
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2005/724927-23285-2005.gz

Once I download the data into a temporary file, I want to unzip it and
store it into another directory where I can access it.

Also, why are there 2015 indices instead of just 10 when I'm only looping
through 2005:2015?

Thanks,
Alexandra

On Thu, Feb 5, 2015 at 3:11 AM, Jon Skoien <jon.skoien at jrc.ec.europa.eu>
wrote:

> In addition to following Jim's suggestion, you should probably also use
> full.names = TRUE, otherwise you will try to open a connection to files in
> your current directory, not in tmpdir.
> Another thing is that the unzipped files appear irregular with respect to
> columns, so read.table might not work too well.
>
> Jon
>
>
> On 2/5/2015 11:30 AM, jim holtman wrote:
>
>> try taking the quotes off of 'files'
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Wed, Feb 4, 2015 at 5:24 PM, Alexandra Catena <amc5981 at gmail.com>
>> wrote:
>>
>>  Hi All,
>>>
>>> I need to loop through and download the past 10 years of met data to a
>>> temporary directory.  I then need to unzip it and place it into another
>>> directory.
>>>
>>>
>>> year = (2005:2015)
>>>
>>> for (i in year)
>>>    tmpdir = tempdir()
>>>    file[i] = file.path(tmpdir, sprintf('724927-23285-%4i.gz', i))
>>>    url = sprintf('
>>> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/%4i/724927-23285-%4i.gz', i, i)
>>>    #file = basename(url)
>>>    download.file(url, file[i])
>>>    files = dir(tmpdir, '*.gz', full.names=FALSE)
>>>    read.table(gzfile('files'))
>>>
>>>
>>>
>>> 'file' returns 2015 indices with "/tmp/RtmpKvB4Wz/724927-23285-2015.gz"
>>> next to 2015. and files returns 724927-23285-2015.gz.  However, when I
>>> try
>>> to unzip the gz file using the last line, it says it cannot open the
>>> connection and the probable reason is that there is no such file or
>>> directory.
>>>
>>>
>>>
>>> Thanks,
>>> Alexandra
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> --
> Jon Olav Skøien
> Joint Research Centre - European Commission
> Institute for Environment and Sustainability (IES)
> Climate Risk Management Unit
>
> Via Fermi 2749, TP 100-01,  I-21027 Ispra (VA), ITALY
>
> jon.skoien at jrc.ec.europa.eu
> Tel:  +39 0332 789205
>
> Disclaimer: Views expressed in this email are those of the individual and
> do not necessarily represent official views of the European Commission.
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list