[R] Relational Databases or XML?

Doran, Harold HDoran at air.org
Thu Apr 10 22:28:41 CEST 2008


I'm not sure it is possible to parse an XML file in R directly. Well, I
guess it's *possible*, but may not be the best way to do it. ElementTree
in Python is an easy-to-use parser that you might use to first parse
your XML file (or others hierarchically structured data), organize it
anyway you want, and then bring those data into R for subsequent
analysis.

In fact, I have recently done just this. I have another statistical
program that outputs data as an XML file. So, I wrote a python program
that parses that XML file, pulls out the data of interest into a text
file, and then I bring those data into R for analysis.

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Keith Alan 
> Chamberlain
> Sent: Thursday, April 10, 2008 4:14 PM
> To: r-help at r-project.org
> Subject: [R] Relational Databases or XML?
> 
> Dear R-Help,
> 
> I am working on a paper in an R course for large file support 
> in R using scan(), relational databases, and XML. I have 
> never used SQL or heirarchical document formats such as XML 
> (except where it occurs without user interaction), and 
> knowledge in RDBs and XML is lacking in my program. I have 
> tried finding a working example for the novices-novice on the 
> topic, read many postings, the r-data I/O manual several 
> times, and descriptions of packages RODBC, DBI, XML, among 
> others. I understand that RDBs are (assumed at least) used 
> widely among the R community. I have not been able to put all 
> of the pieces together, but assuming that RDB use is actually 
> quite widespread, it should be quite easy to fill me in 
> and/or correct my understanding where necessary.
> 
> For a cross-platform solution (PC/OSX at least, or in part) 
> my questions/problems are about what preliminary steps are 
> needed to get an SQL or XML query "to work" in R to begin 
> with, what the appropriate data-file formats are, and how to 
> convert to them if starting out with data in, say, a 
> delimited ASCII text file. Very basic examples should 
> suffice, say, a table with 20 random observations, a grouping 
> variable with 2 levels, and a factor with 2 levels.
> 
> ## untested code
> set.seed(1024)
> write.table("junk.txt", 
> data.frame(Subj=c(rep(1,10),rep(2,10)),block=rep(c(rep(-1,5),r
> ep(1,5)),2), obs=rnorm(20,0,1)))
> 
> Specifically,
> 
> 1- what are the minimum required non R components that are 
> needed to support SQL or XML functionality, which may or may 
> not need to be installed?
> 
> 2- what R packages need to be installed, at a minimum (also 
> as a cross-PC/Mac solution if possible or at least as much as 
> possible)
> 
> 3- I keep seeing reference to connections of a given name "if 
> previously setup". What kind of setup is needed outside of R, if any?
> 
> 4- what steps are needed in R to then connect to a file and 
> import a subset based on a query?
> 
> 5- Do I then use standard R routines (e.g. write()) to export 
> as a DB, or an RDB/XML specific function?
> 
> Sincerely,
> KeithC. [U.S]
> 
> 1/k^c
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list