[R] Web scraping - Having trouble figuring out how to approach this problem
jdnewmil at dcn.davis.ca.us
Thu Feb 23 19:03:42 CET 2017
The answer is yes, and does not seem like a big step from where you are now, so seeing what you already know how to do (reproducible example, or RE) would help focus the assistance. There are quite a few ways to do this kind of thing, and what you already know would be clarified with a RE.
Sent from my phone. Please excuse my brevity.
On February 22, 2017 2:52:55 PM PST, henrique monte <henrique.monte66 at gmail.com> wrote:
>Sometimes I need to get some data from the web organizing it into a
>dataframe and waste a lot of time doing it manually. I've been trying
>figure out how to optimize this proccess, and I've tried with some R
>scraping approaches, but couldn't get to do it right and I thought
>could be an easier way to do this, can anyone help me out with this?
>Here's a webpage with countries listed by continents:
>Each country name is also a link that leads to another webpage
>each country, e.g. https://simple.wikipedia.org/wiki/Angola).
>I would like as a final result to get a data frame with number of
>observations (rows) = number of countries listed and 4 variables
>as ID=Country Name, Continent=Continent it belongs to,
>language (from the specific webpage of the Countries) and Population =
>recent population count (from the specific webpage of the Countries).
>The main issue I'm trying to figure out is handling several webpages,
>would it be possible to scrape from the first link of the problem the
>countries as a list with the links of the countries webpages and then
>create and run a function to run a scraping command in each of those
>from the list to get the specific data I'm looking for?
> [[alternative HTML version deleted]]
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>PLEASE do read the posting guide
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help