[R] melt error that I don't understand.

Ista Zahn istazahn at gmail.com
Fri Sep 6 20:40:22 CEST 2013


Hi Benjamin,

This looks like a bug, whereby melt fails when numeric id.vars have
attributes. Consider:

D <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
                    AGE = structure(c(68L, 63L, 55L, 64L, 60L, 78L,
60L, 62L, 60L, 75L),
                                    label = "Age", class = "labelled"),
                    BMI = structure(c(25L, 27L, 27L, 28L, 32L, NA,
36L, 27L, 31L, 25L),
                                    label = "BMI (kg/m2)", class = "labelled"),
                    EventDays = structure(c(722L, 738L, 707L, 751L,
735L, 728L, 731L, 717L, 728L, 735L),
                                          label = "Time to first
ACM/censor (days)", class = "labelled"),
                    ImplantDays = c(NA, NA, 575, NA, NA, NA, 490, 643, NA, NA)),
               .Names = c("ID", "AGE", "BMI", "EventDays", "InterventionDays"),
               row.names = c(NA, 10L),
               class = "data.frame")

melt(D, c("ID", "AGE", "BMI")) ## does not work

D <- as.data.frame(lapply(D, as.vector)) ## strip attributes
melt(D, c("ID", "AGE", "BMI")) ## works

attr(D$ID, "label") <- "ID number"  ## add attribute to factor
melt(D, c("ID", "AGE", "BMI")) ## works

attr(D$AGE, "label") <- "Age" ## add attribute to numeric variable
melt(D, c("ID", "AGE", "BMI")) ## does not work


I've reported the bug at https://github.com/hadley/reshape/issues/36

Best,
Ista

On Fri, Sep 6, 2013 at 11:59 AM, Nutter, Benjamin <NutterB at ccf.org> wrote:
> I'm stumped.  I have a dataset I want to melt to create a temporal sequence of events for each subject, but in each row, I would like to retain the baseline characteristics.
>
> D <- structure(list(ID = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
>                     AGE = structure(c(68L, 63L, 55L, 64L, 60L, 78L, 60L, 62L, 60L, 75L),
>                                     label = "Age", class = "labelled"),
>                     BMI = structure(c(25L, 27L, 27L, 28L, 32L, NA, 36L, 27L, 31L, 25L),
>                                     label = "BMI (kg/m2)", class = "labelled"),
>                     EventDays = structure(c(722L, 738L, 707L, 751L, 735L, 728L, 731L, 717L, 728L, 735L),
>                                           label = "Time to first ACM/censor (days)", class = "labelled"),
>                     ImplantDays = c(NA, NA, 575, NA, NA, NA, 490, 643, NA, NA)),
>                .Names = c("ID", "AGE", "BMI", "EventDays", "InterventionDays"),
>                row.names = c(NA, 10L),
>                class = "data.frame")
>
> melt(D, c("ID", "AGE", "BMI")) # produces the following error
>
> Error in data.frame(ids, variable, value, stringsAsFactors = FALSE) :
>   arguments imply differing number of rows: 10, 20
>
>
> Now, I know AGE and BMI aren't exactly identifying variables, but my hope would be that, since ID uniquely identifies the subjects, I could use this as a short cut to getting the data set I want.  I can get the data I want if I go about it a little differently.
>
> #* What I would like it to look like.
> Timeline <- melt(D[, c("ID", "EventDays", "InterventionDays")], "ID", na.rm=TRUE)
> Timeline <- arrange(Timeline, ID, value)
> Timeline <- merge(D[, c("ID", "AGE", "BMI")],
>                   Timeline,
>                   by="ID", all.x=TRUE)
>
>
> At first I thought it might be the mixture of character and numeric variables as IDs, but the following example works
>
> A <- data.frame(id = LETTERS[1:10],
>                 age = c(50, NA, 51, 52, 53, 54, 55, 56, 57, 58),
>                 meas1 = rnorm(10),
>                 meas2 = rnorm(10, 5),
>                 stringsAsFactors=FALSE)
> melt(A, c("id", "age"))
>
>
> I'm sure I'm missing something really obvious (kind of like how I can stare at the dry goods aisle for 10 minutes and still not find the chocolate chips).  If anyone could help me understand why this error is occurring, I'd greatly appreciate it.
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] lazyWeave_2.2.3  Hmisc_3.10-1     survival_2.36-14 plyr_1.7.1       reshape2_1.2.2
>
> loaded via a namespace (and not attached):
> [1] cluster_1.14.3  grid_2.15.2     lattice_0.20-10 stringr_0.6.1   tools_2.15.2
>
>
>   Benjamin Nutter |  Biostatistician     |  Quantitative Health Sciences
>   Cleveland Clinic    |  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216) 445-1365
>
>
>
> ===================================
>
>
>  Please consider the environment before printing this e-mail
>
> Cleveland Clinic is ranked as one of the top hospitals in America by U.S.News & World Report (2013).
> Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations.
>
>
> Confidentiality Note:  This message is intended for use ...{{dropped:18}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list