[R] How to import HTML and SQL files
warren at etr-usa.com
Wed Feb 4 15:36:18 CET 2009
> I can't import any HTML or SQL files into R..:confused:
Yeah, I'm confused, too.
What exactly is it you're trying to do? Not the technical task you
asked about, but the effect you're trying to achieve? Can you give
details about the exact nature of your data sources, or, better, examples?
I ask because actually importing HTML and SQL files is almost certainly
the wrong approach. You almost never want to handle texts in either
language directly in R.
For SQL, you usually don't have "SQL files": files literally containing
SQL queries. Or if you do happen to have SQL query files, you probably
don't want to parse them with R. I expect what you really want is to be
able to query a database using SQL. For that, look up DBI on CRAN.
This will let you connect R to a database server, and use SQL to get
data from it in a format that R can process directly.
For HTML, the problem is that HTML is a very difficult language to parse
correctly in the general case. Much of the reason for that is that few
web pages are actually legal HTML, but browsers will quietly cope with
many classes of errors. To parse such stuff in R, it's usually best to
take a case-by-case approach, matching particular structures within the
file so you can extract the few bits of data you want. You might want
to post a snippet of the HTML here to get suggestions.
If you really do have to be able to accept arbitrary HTML, I'd suggest
running the HTML through a filter that converts it to XHTML, then use
the XML package from CRAN to load it up into R.
You might also want to look into the RCurl package, if the HTML lives on
a web server. You can download it directly instead of saving it out to
an HTML file. Then you can use the methods above to process it.
More information about the R-help