[R] Help with recast() syntax

jdnewmil jdnewmil at dcn.org
Tue Nov 29 07:25:41 CET 2011


 Inline below...

 On Mon, 28 Nov 2011 21:32:21 -0800 (PST), Chris Conner 
 <connerpharmd at yahoo.com> wrote:
> Dear Help-Rs,
>  
> I have data similar to the following:
>  
> DF <- structure(list(X = 1:22, RESULT = structure(c(2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L), .Label = c("NEG", "POS"), class = "factor"), YR_MO = 
> c(201011L,
> 201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L,
> 201107L, 201108L, 201109L, 201011L, 201012L, 201101L, 201102L,
> 201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L
> ), TOT_TESTS = c(66L, 98L, 109L, 122L, 113L, 111L, 113L, 146L,
> 124L, 130L, 120L, 349L, 393L, 376L, 371L, 396L, 367L, 406L, 383L,
> 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO", "TOT_TESTS"
> ), class = "data.frame", row.names = c(NA, -22L))
>  
> Currently there are 2 observations for each month (one for negative
> and one for positive test results).  What I need to create a data set
> that looks like the following, with positive and negative test 
> results
> in the same row organized by month:
>  
> DF2<-structure(list(X = 1:11, RESULT = structure(c(1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "POS", class = "factor"),
>     YR_MO = c(201011L, 201012L, 201101L, 201102L, 201103L, 201104L,
>     201105L, 201106L, 201107L, 201108L, 201109L), POS_TESTS = c(66L,
>     98L, 109L, 122L, 113L, 111L, 113L, 146L, 124L, 130L, 120L
>     ), NEG_TESTS = c(349L, 393L, 376L, 371L, 396L, 367L, 406L,
>     383L, 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO",
> "POS_TESTS", "NEG_TESTS"), class = "data.frame", row.names = c(NA,
> -11L))

 Thanks for the sample data.

> As this is something that I understand Hadley Wickham's Reshape
> package is ideally suited for, I tried using the following reshape
> command:
>  
> ReshapeDF <- recast(DF, YR_MO~variable)
>  
> I get the following error message:
>  
> Using RESULT as id variables
> Error: Casting formula contains variables not found in molten data: 
> YR_MO

 I don't think you need to melt the data first, so you don't need the 
 recast function.

 # reshape2 is faster than reshape, but slightly syntactically different
 library(reshape2)
 # rename the RESULT levels
 DF0 <- DF
 levels( DF0$RESULT ) <- c( "NEG_TOTAL", "POS_TOTAL" )
 # cast to data frame, use sum if more than one row for a given YR_MO
 DF0 <- dcast( DF0, YR_MO~RESULT, sum, value.var="TOT_TESTS" )
 # The rest of this is to make the data frame look like your result, 
 which seems
 # unnecessary to me, but perhaps there is a good reason for keeping X 
 and RESULT
 DF1 <- merge( DF[ DF$RESULT=="POS", c( "X", "RESULT", "YR_MO" ) ], DF0 
 )
 DF2 <- DF1[,c("X", "RESULT", "YR_MO", "POS_TOTAL", "NEG_TOTAL" ) ]

> I have a work around that allows me to get to my desired endpoint
> that involves splitting the data.frame into two (by test result), 
> then
> using the YR_MO as the by.x/by.y in a merge, but I think this task
> would be handled more efficiently using reshape?  Can anyone help me
> to see where I'm going wrong?  Thanks in advance!
>
> 	[[alternative HTML version deleted]]

 (Please remember that this is a plain text email list.)

 ---------------------------------------------------------------------------
 Jeff Newmiller                        The     .....       .....  Go 
 Live...
 DCN:<jdnewmil_at_dcn.davis.ca.us>     Basics: ##.#.       ##.#.  Live 
 Go...
                                       Live:   OO#.. Dead: OO#..  
 Playing
 Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
 /Software/Embedded Controllers)               .OO#.       .OO#.  
 rocks...1k



More information about the R-help mailing list