[R] retrieve certain part from html

Romain Francois romain.francois at dbmail.com
Wed Sep 23 14:39:54 CEST 2009


Hi,

The R4X package can help you. (I have wrapped your td's into one tr)

 > x <- xml( "<tr><td><a href='2005-01.html'>2005-01</a></td><td><a
+ href='2006-01.html'>2006-01</a></td><td><a
+ href='2007-01.html'>2007-01</a></td><td><a
+ href='2008-01.html'>2008-01</a></td><td><a
+ href='2009-01.html'>2009-01</a></td></tr>" )

 > x["td/a/#"]
        td        td        td        td        td
"2005-01" "2006-01" "2007-01" "2008-01" "2009-01"
 > x["td/a/@href"]
             td             td             td             td             td
"2005-01.html" "2006-01.html" "2007-01.html" "2008-01.html" "2009-01.html"

Romain

On 09/23/2009 02:29 PM, Rene wrote:
>
> Dear All,
>
> Can someone please guide me how to get the certain part from a long html
> language?
>
> e.g.
>
>
>
> "<td><a href='2005-01.html'>2005-01</a></td><td><a
> href='2006-01.html'>2006-01</a></td><td><a
> href='2007-01.html'>2007-01</a></td><td><a
> href='2008-01.html'>2008-01</a></td><td><a
> href='2009-01.html'>2009-01</a></td>"
>
>
>
> How to get only the wording of  "2005-01.html", "2006-01.html",
> "2007-01.html"," 2008-01.html"," 2009-01.html" from the above html code? I
> have tried to use gsub function, but not working.
>
>
>
> Please guide me on this.
>
>
>
> Thanks a lot.
>
> Rene.

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc
|- http://tr.im/yw8E : New R package : sos
`- http://tr.im/y8y0 : search the graph gallery from R




More information about the R-help mailing list