[R] Download CSV Files from EUROSTAT Website

Lorenzo Isella lorenzo.isella at gmail.com
Mon Nov 4 20:03:39 CET 2013


Thanks.
I had already introduced this minor adjustments in the code, but the real  
problem (to me) is the information that gets lost: the informative name of  
the columns, the indicator type and the units.
Cheers

Lorenzo

On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas <ruipbarradas at sapo.pt>  
wrote:

> Hello,
>
> If you want to get rid of the (bp) stuff, you can use lapply/gsub. Using  
> Jean's code a bit changed,
>
> library(XML)
>
> mylines <- readLines(url("http://bit.ly/1coCohq"))
> closeAllConnections()
> mytable <- readHTMLTable(mylines, which = 2, asText=TRUE,  
> stringsAsFactors = FALSE)
>
> str(mytable)
>
> mytable[] <- lapply(mytable, function(x) gsub("\\(.*\\)", "", x))
> mytable[] <- lapply(mytable, function(x) gsub(",", "", x))
> mytable[] <- lapply(mytable, as.numeric)
>
> colnames(mytable) <- 2000:2013
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 04-11-2013 09:53, Lorenzo Isella escreveu:
>> Hello,
>> And thanks a lot.
>> This is indeed very close to what I need.
>> I am trying to figure out how not to "lose" the headers and how to avoid
>> downloading labels like "(p)" together with the numerical data I am
>> interested in.
>> If anyone on the list knows how to make this minor modifications, s/he
>> will make my life much easier.
>> Cheers
>>
>> Lorenzo
>>
>>
>> On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean <jvadams at usgs.gov>  
>> wrote:
>>
>>> Lorenzo,
>>>
>>> I may be able to help you get started.  You can use the XML package to
>>> grab the information >off the internet.
>>>
>>> library(XML)
>>>
>>> mylines <- readLines(url("http://bit.ly/1coCohq"))
>>> closeAllConnections()mylist <- readHTMLTable(mylines,
>>> asText=TRUE)mytable <- mylist1$xTable
>>>
>>> However, when I look at the resulting object, mytable, it doesn't have
>>> informative row or >column headings.  Perhaps someone else can figure
>>> out how to get that information.
>>>
>>> Jean
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella
>>> <lorenzo.isella at gmail.com> wrote:
>>>> Dear All,
>>>> I often need to do some work on some data which is publicly available
>>>> on the EUROSTAT >>website.
>>>> I saw several ways to download automatically mainly the bulk data
>>>> from EUROSTAT to later on >>postprocess it with R, for instance
>>>>
>>>> http://bit.ly/HrDICj
>>>> http://bit.ly/HrDL10
>>>> http://bit.ly/HrDTgT
>>>>
>>>> However, what I would like to do is to be able to download directly
>>>> the csv file >>corresponding to a properly formatted dataset
>>>> (typically a dynamic dataset) from EUROSTAT.
>>>> To fix the ideas, please consider the dataset at the following link
>>>>
>>>> http://bit.ly/1coCohq
>>>>
>>>> what I would like to do is to automatically read its content into R,
>>>> or at least to >>automatically download it as a csv file (full
>>>> extraction, single file, no flags and >>footnotes) which I can then
>>>> manipulate easily.
>>>> Any suggestion is appreciated.
>>>> Cheers
>>>>
>>>> Lorenzo
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list