[R] How to suppress errors from htmlTreeParse() function in XML package?

Martin Morgan mtmorgan at fhcrc.org
Tue Nov 4 18:21:08 CET 2008


Hi Tony --

Tony Breyal <tony.breyal at googlemail.com> writes:

> Dear R-help,
>
> The following code downloads an html document into variable 'doc' and
> then stores an internal representation into variable 'html.tree'. Even
> if the html code is malformed, this still works which is fantastic.
> However, as in the example below, i do get some ouput from R in the
> console which i would like to suppress somehow, so i can keep my
> window a bit cleaner.
>
> I understand that the output is just letting me know that the html
> code is malformed, but for my purposes i can ignore that output. Is
> there a way to achieve this?
>
> ### Example:
> library(RCurl); library(XML)
> doc <- getURL('http://www.google.co.uk/search?q=%22R%20Project
> %22&as_qdr=d1&num=100')
> html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE)

How about capture.output

res <- capture.output(html.tree <- htmlTreeParse(doc, useInternalNodes = TRUE))

Martin

> ### Output - this is what i would like to suppress
> Tag nobr invalid
> htmlParseEntityRef: expecting ';'
> htmlParseEntityRef: expecting ';'
> ### etc.
>
> I attempted to use try(expr, silent=TRUE) but that didn't work for me:
>>  try(htmlTreeParse(doc, useInternalNodes = TRUE), silent=TRUE)
>
>
> Many thanks in advance for any help,
> Tony Breyal
>
>
> ### O/S = Windows Vista Ultimate ###
>> sessionInfo()
> R version 2.8.0 (2008-10-20)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
> 1252;LC_MONETARY=English_United Kingdom.
> 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] XML_1.98-1   RCurl_0.91-0
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the R-help mailing list