[R] Download CSV Files from EUROSTAT Website

Rui Barradas ruipbarradas at sapo.pt
Mon Nov 4 19:52:51 CET 2013


Hello,

If you want to get rid of the (bp) stuff, you can use lapply/gsub. Using 
Jean's code a bit changed,

library(XML)

mylines <- readLines(url("http://bit.ly/1coCohq"))
closeAllConnections()
mytable <- readHTMLTable(mylines, which = 2, asText=TRUE, 
stringsAsFactors = FALSE)

str(mytable)

mytable[] <- lapply(mytable, function(x) gsub("\\(.*\\)", "", x))
mytable[] <- lapply(mytable, function(x) gsub(",", "", x))
mytable[] <- lapply(mytable, as.numeric)

colnames(mytable) <- 2000:2013


Hope this helps,

Rui Barradas

Em 04-11-2013 09:53, Lorenzo Isella escreveu:
> Hello,
> And thanks a lot.
> This is indeed very close to what I need.
> I am trying to figure out how not to "lose" the headers and how to avoid
> downloading labels like "(p)" together with the numerical data I am
> interested in.
> If anyone on the list knows how to make this minor modifications, s/he
> will make my life much easier.
> Cheers
>
> Lorenzo
>
>
> On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean <jvadams at usgs.gov> wrote:
>
>> Lorenzo,
>>
>> I may be able to help you get started.  You can use the XML package to
>> grab the information >off the internet.
>>
>> library(XML)
>>
>> mylines <- readLines(url("http://bit.ly/1coCohq"))
>> closeAllConnections()mylist <- readHTMLTable(mylines,
>> asText=TRUE)mytable <- mylist1$xTable
>>
>> However, when I look at the resulting object, mytable, it doesn't have
>> informative row or >column headings.  Perhaps someone else can figure
>> out how to get that information.
>>
>> Jean
>>
>>
>>
>>
>>
>> On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella
>> <lorenzo.isella at gmail.com> wrote:
>>> Dear All,
>>> I often need to do some work on some data which is publicly available
>>> on the EUROSTAT >>website.
>>> I saw several ways to download automatically mainly the bulk data
>>> from EUROSTAT to later on >>postprocess it with R, for instance
>>>
>>> http://bit.ly/HrDICj
>>> http://bit.ly/HrDL10
>>> http://bit.ly/HrDTgT
>>>
>>> However, what I would like to do is to be able to download directly
>>> the csv file >>corresponding to a properly formatted dataset
>>> (typically a dynamic dataset) from EUROSTAT.
>>> To fix the ideas, please consider the dataset at the following link
>>>
>>> http://bit.ly/1coCohq
>>>
>>> what I would like to do is to automatically read its content into R,
>>> or at least to >>automatically download it as a csv file (full
>>> extraction, single file, no flags and >>footnotes) which I can then
>>> manipulate easily.
>>> Any suggestion is appreciated.
>>> Cheers
>>>
>>> Lorenzo
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list