[R] Scraping HTML using R

Steve Lianoglou lianoglou.steve at gene.com
Fri Feb 6 00:26:00 CET 2015


You want to take a look at rvest:

https://github.com/hadley/rvest

On Thu, Feb 5, 2015 at 2:36 PM, Madhuri Maddipatla
<madhuri.vio at gmail.com> wrote:
> Dear R experts,
>
> My requirement for web scraping in R goes like this.
>
> *Step 1* - All the medical condition from from A-Z are listed in the link
> below.
>
> http://www.webmd.com/drugs/index-drugs.aspx?show=conditions
>
> Choose the first condition say Acid Reflux(GERD-...)
>
> *Step 2 *- It lands on the this page
>
> http://www.webmd.com/drugs/condition-1999-Acid%20Reflux%20%20GERD-Gastroesophageal%20Reflux%20Disease%20.aspx?diseaseid=1999&diseasename=Acid+Reflux+(GERD-Gastroesophageal+Reflux+Disease)&source=3
>
> with a list of drugs.
>
> Choose the column user reviews of the first drug say "Nexium Oral"
>
> *Step 3*: Now it lands on the webpage
>
> http://www.webmd.com/drugs/drugreview-20536-Nexium+oral.aspx?drugid=20536&drugname=Nexium+oral
>
> with a list of reviews.
> I would like to scrape review information into a tabular format by scraping
> the html.
> For instance, i would like to fetch the full comment of each review as a
> column in a table.
> Also it should automatically go to next page and fetch the full comments of
> all reviewers.
>
>
> Please help me in this endeavor and thanks a lot in advance for reading my
> mail and expecting response with your experience and expertise.
>
> Also please suggest me the possibility around my stepwise plan and any
> advice you would like to give me along with the solution.
>
> High Regards,
> *-----------------------------------------------------------------------------------------*
> *Madhuri Maddipatla*
> *-----------------------------------------------------------------------------------------*
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Steve Lianoglou
Computational Biologist
Genentech



More information about the R-help mailing list