[R] trouble for parsing HTML files

R. Michael Weylandt michael.weylandt at gmail.com
Fri Mar 23 14:48:31 CET 2012


I just tried it on R. 2.14.1 and R 2.15.0 RC (r58802) and both worked
with XML 3.9-4 on both 32 and 64-bit R on my Mac OS X 10.6.8 with the
same locale setting so I can only guess it's one of three things:

i) The website is generating different content for you than for Milan
and me [wild guess]
ii) Something in the OS 10.5 -> 10.6 difference [process of elimination]
iii) Perhaps a shortlived bug in 2.14.2 -- can you update to 2.15 and
see if it still throws that error? [the only one I know how to do
anything for]

Michael

On Fri, Mar 23, 2012 at 3:10 AM, Julien Velcin
<julien.velcin at univ-lyon2.fr> wrote:
> Here it is:
>
> R version 2.14.2 (2012-02-29)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] XML_3.9-4
>
> Thank you!
>
> Julien
>
> On Mar 22, 2012, at 10:12 PM, R. Michael Weylandt wrote:
>
>> Please give sessionInfo() so we can know your version of XML.
>>
>> Michael
>>
>> On Thu, Mar 22, 2012 at 2:04 PM, Julien Velcin
>> <jvelcin at chirouble.univ-lyon2.fr> wrote:
>>>
>>> I use mac OS 10.5.8 with this version of R:
>>>
>>> R version 2.14.1 (2011-12-22)
>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>>
>>> I've tried the command "RSiteSearch", but with no result.
>>>
>>> BTW, I recall that the code I've posted works for some websites.
>>>
>>> Julien
>>>
>>>
>>>
>>>
>>> 2012/3/22, Milan Bouchet-Valat <nalimilan at club.fr>:
>>>>
>>>> Le jeudi 22 mars 2012 à 17:20 +0100, Julien Velcin a écrit :
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Using the XML package, I'm not able to parse some html webpages. Here
>>>>> is my code and the error message:
>>>>>
>>>>> library("XML")
>>>>> url <-
>>>>> "http://www.huffingtonpost.com/social/GraniteSkyline?action=fans"
>>>>> doc <- htmlParse(url)
>>>>>
>>>>> Error: Namespace prefix ꛀ of attribute (null) is not defined
>>>>>
>>>>> I've searched a lot on the Internet, but it's really difficult to find
>>>>> something useful for R.
>>>>
>>>> What versions of R and XML are you using? The code you provided works
>>>> fine here (R 2.14.1 x86_64 and XML 3.9-4 on Fedora 16). sessionInfo()
>>>> will help us.
>>>>
>>>> BTW, see ?RSiteSearch to search for R content on the Web.
>>>>
>>>>
>>>> Cheers
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list