[R] Programcode and data in the same textfile

Torsten Hothorn Torsten.Hothorn at rzmail.uni-erlangen.de
Thu Jun 12 15:00:26 CEST 2003


> I have the following problem.  It is not of earthshaking importance,
> but still I have spent a considerable amount of time thinking about
> it.
>
> PROBLEM: Is there any way I can have a single textfile that contains
> both
>
>   a) data
>
>   b) programcode
>
> The program should act on the data, if the textfile is source()'ed
> into R.
>
>
> BOUNDARY CONDITION: I want the data written in the textfile in exactly
> the same format as I would use, if I had data in a separate textfile,
> to be read by read.table().  That is, with 'horizontal inhomogeneity'
> and 'vertical homogeneity' in the type of entries.  I want to write
> something like
>
>       Sex    Respons
>       Male   1
>       Male   2
>       Female 3
>       Female 4
>


something like

  tmpfilename <- tempfile()
  tmpfile <- file(tmpfilename, "w")
  cat(

      ### here comes my data

      "Sex    Respons",
      "Male   1",
      "Male   2",
      "Female 3",
      "Female 4",

      ### end of data input

      file = tmpfile, sep="\n")
  close(tmpfile)
  read.table(tmpfilename, header = TRUE)


best,

Torsten

> In effect, I am asking if there is some way I can convince
> read.table(), that the data is contained in the following n lines of
> text.
>
>
> ILLEGAL SOLUTIONS:
> I know I can simulate the behaviour by reading the columns of the
> dataframe one by one, and using data.frame() to glue them together.
> Like in
>
>     data.frame(Sex = c('Male', 'Male', 'Female', 'Female'),
>                Respons = c(1, 2, 3, 4))
>
> I do not like this solution, because it represents the data in a
> "transposed" way in the textfile, and this transposition makes the
> structure of the dataframe less transparent - at least to me. It
> becomes even less comprehensible if the Sex-factor above is written
> with the help of rep() or gl() or the like.
>
> I know I can make read.table() read from stdin, so I could type the
> dataframe at the prompt.  That is against the spirit of the problem,
> as I describe below.
>
>
> I know I can make read.table() do the job, if I split the data and the
> programcode in to different files.  But as the purpose of the exercise
> is to distribute the data and the code to other people, splitting
> into several files is a complication.
>
>
> MOTIVATION: I frequently find myself distributing small chunks of code
> to my students, along with data on which the code can work.
>
> As an example, I might want to demonstrate how model.matrix() treats
> interactions, in a certain setting.  For that I need a dataframe that
> is complex enough to exhibit the behaviour I want, but still so small
> that the model.matrix is easily understood.  So I make such a
> dataframe.
>
> I am trying to distribute this dataframe along with my code, in a way
> that is as simple as possible to USE for the students (hence the
> one-file boundary condition) and to READ (hence the non-transposition
> boundary condition).
>
>
>
> Does anybody have any ideas?
>
>
> Ernst Hansen
> Department of Statistics
> University of Copenhagen
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>




More information about the R-help mailing list