[R] NADA Data Frame Format: Wide or Long?

Rich Shepard rshepard at appl-ecosys.com
Tue Jul 3 18:57:30 CEST 2012


   I have water chemistry data with censored values (i.e., those less than
reporting levels) in a data frame with a narrow (i.e., database table)
format. The structure is:

  $ site    : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 1 ...
  $ sampdate: Date, format: "2007-12-12" "2007-12-12" ...
  $ preeq0  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
  $ param   : Factor w/ 37 levels "Ag","Al","Alk_tot",..: 1 2 8 17 3 4 9 ...
  $ quant   : num  0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ...
  $ ceneq1  : logi  TRUE FALSE TRUE FALSE FALSE FALSE ...
  $ floor   : num  0 0.106 0 231 231 0.011 0 0 0 100 ...
  $ ceiling : num  0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ...

   The logical 'preeq0' separates sampdate into two groups; 'ceneq1'
indicates censored/uncensored values; 'floor' and 'ceiling' are the minima
and maxima for censored values.

   The NADA package methods will be used, but I have not found information on
whether this format or the wide (i.e., spreadsheet) format should be used.
The NADA.pdf document doesn't tell me; at least, I haven't found the answer
there. I can apply reshape2 to melt and re-cast the data in wide format if
that's what is appropriate. Please provide a pointer to documents I can read
for an answer to this and related questions.

Rich



More information about the R-help mailing list