[R] reshape2's dcast() Adds NAs to Data Frame

arun smartpink111 at yahoo.com
Wed Aug 8 23:48:04 CEST 2012


HI,

The param factor has 54 levels.  So, I guess the combinations that are not present will record as NA.
A.K.




----- Original Message -----
From: Rich Shepard <rshepard at appl-ecosys.com>
To: r-help at r-project.org
Cc: 
Sent: Wednesday, August 8, 2012 2:33 PM
Subject: Re: [R] reshape2's dcast() Adds NAs to Data Frame

On Tue, 7 Aug 2012, R. Michael Weylandt wrote:

> Can you provide a reproducible example? See, e.g.,

Michael,

  I think the attached 'sample.txt' and 'sample.cast.txt' should do. There
are no missing values in sample.txt but there are in the reshaped data
frame. The sequence of commands I used to generate these are:

> sample <- read.table('sample.txt', header = T, sep = ',')
> sample$sampdate <- as.Date(as.character(sample$sampdate))
> sample$ceneq1 <- as.logical(sample$ceneq1)
> str(sample)
'data.frame':    715 obs. of  8 variables:
$ site    : Factor w/ 5 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 ...
$ sampdate: Date, format: "2007-12-12" "2007-12-12" ...
$ era     : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ...
$ param   : Factor w/ 54 levels "AgDis","AgTot",..: 2 4 5 7 10 13 21 ...
$ quant   : num  1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 2.39e-02 ...
$ ceneq1  : logi  TRUE FALSE FALSE FALSE TRUE FALSE ...
$ floor   : num  0 0.106 231 0.0113 0 100 0 1.43 0 0.0239 ...
$ ceiling : num  1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 2.39e-02 ...
> sample.melt <- melt(sample, id.vars = c('site', 'sampdate', 'era', 'param', 'ceneq1', 'floor', 'ceiling'))
> sample.cast <- dcast(sample.melt, site + sampdate + era + ceneq1 + floor + ceiling ~ param)
> str(sample.cast)
'data.frame':    668 obs. of  60 variables:
$ site    : Factor w/ 5 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 ...
$ sampdate: Date, format: "2007-12-12" "2007-12-12" ...
$ era     : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ...
$ ceneq1  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
$ floor   : num  0.00132 0.0113 0.0239 0.0253 0.0348 0.106 0.293 4.11 ...
$ ceiling : num  0.00132 0.0113 0.0239 0.0253 0.0348 0.106 0.293 4.11 ...
$ AgDis   : num  NA NA NA NA NA NA NA NA NA NA ...
$ AgTot   : num  NA NA NA NA NA NA NA NA NA NA ...
$ AlDis   : num  NA NA NA NA NA NA NA NA NA NA ...
$ AlTot   : num  NA NA NA NA NA 0.106 NA NA NA NA ...
etc.

> dput(sample, 'sample.txt')
> dput(sample.cast, 'sample.cast.txt')

  The context for this is my learning how to use the NADA package to plot
and analyze left-censored data. The full data set has 64 site and param
levels. I don't know if I can use the base data frame, the reshaped (dcast)
data frame or individual subsets (one for each parameter).

Rich


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list