[R] Programcode and data in the same textfile

Duncan Temple Lang duncan at research.bell-labs.com
Thu Jun 12 20:05:48 CEST 2003


Hi Ernst.

I have found myself in a similar situation where I want to send 
code to someone with annotations that explain the different pieces
in richer ways than comments will permit. 

If you want to contain both data and code within a single document,
you will need to have some way to identify which is which so that the
software can distinguish the different elements of the document.  This
is precisely what a markup language does. And rather than inventing ad
hoc conventions, why not simply use a real markup language. XML is the most
natural one, and doing something like

<doc>
 <data>
  Sex    Response
  Male   1
  Male   2
  Female 3
  Female 4
 </data>

 <code>
  ......
 </code>
</doc>


Using the XML package, you can read the document into R
and do what you will with it.
To read the data,

 tr = xmlRoot(xmlTreeParse("myFile"))
 read.table(textConnection(xmlValue(tr[["data"]])), header=TRUE)

and to access the code text

 xmlValue(tr[["code"]])


I have a variety of different variants of this style of thing that I
occassionally add to the SXMLDocs package. But, for me at least, it is
easy to write handlers to process the different content but to leave
XML to identify them within the document.

Hope this provides some ideas for thinking about the problem
in a slightly broader light.

 D.


Ernst Hansen wrote:
> I have the following problem.  It is not of earthshaking importance,
> but still I have spent a considerable amount of time thinking about
> it. 
> 
> PROBLEM: Is there any way I can have a single textfile that contains
> both
> 
>   a) data
> 
>   b) programcode
> 
> The program should act on the data, if the textfile is source()'ed
> into R.
> 
> 
> BOUNDARY CONDITION: I want the data written in the textfile in exactly
> the same format as I would use, if I had data in a separate textfile,
> to be read by read.table().  That is, with 'horizontal inhomogeneity'
> and 'vertical homogeneity' in the type of entries.  I want to write
> something like 
> 
>       Sex    Respons
>       Male   1
>       Male   2
>       Female 3
>       Female 4
> 
> In effect, I am asking if there is some way I can convince
> read.table(), that the data is contained in the following n lines of
> text. 
> 
> 
> ILLEGAL SOLUTIONS:
> I know I can simulate the behaviour by reading the columns of the
> dataframe one by one, and using data.frame() to glue them together.
> Like in 
> 
>     data.frame(Sex = c('Male', 'Male', 'Female', 'Female'),
>                Respons = c(1, 2, 3, 4))
> 
> I do not like this solution, because it represents the data in a
> "transposed" way in the textfile, and this transposition makes the
> structure of the dataframe less transparent - at least to me. It
> becomes even less comprehensible if the Sex-factor above is written
> with the help of rep() or gl() or the like.
> 
> I know I can make read.table() read from stdin, so I could type the
> dataframe at the prompt.  That is against the spirit of the problem,
> as I describe below.
> 
> 
> I know I can make read.table() do the job, if I split the data and the
> programcode in to different files.  But as the purpose of the exercise
> is to distribute the data and the code to other people, splitting
> into several files is a complication.
> 
> 
> MOTIVATION: I frequently find myself distributing small chunks of code
> to my students, along with data on which the code can work.
> 
> As an example, I might want to demonstrate how model.matrix() treats
> interactions, in a certain setting.  For that I need a dataframe that
> is complex enough to exhibit the behaviour I want, but still so small
> that the model.matrix is easily understood.  So I make such a
> dataframe.
> 
> I am trying to distribute this dataframe along with my code, in a way
> that is as simple as possible to USE for the students (hence the
> one-file boundary condition) and to READ (hence the non-transposition
> boundary condition).
> 
> 
> 
> Does anybody have any ideas?
> 
> 
> Ernst Hansen
> Department of Statistics
> University of Copenhagen
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

-- 
_______________________________________________________________

Duncan Temple Lang                duncan at research.bell-labs.com
Bell Labs, Lucent Technologies    office: (908)582-3217
700 Mountain Avenue, Room 2C-259  fax:    (908)582-3340
Murray Hill, NJ  07974-2070       
         http://cm.bell-labs.com/stat/duncan




More information about the R-help mailing list