[R] Reshape data frame with dcast and melt

mails mails00000 at gmail.com
Mon Mar 19 11:39:36 CET 2012


Hello,

I implemented two functions reshape_long and reshape_wide (see full working
example below) to reshape data frames.
I created several small examples and the two functions seemed to work
properly. However, using the reshape_wide function
on my real data sets (about 200.000 to 300.000 rows) failed. What happens is
set all values for X, Y and Z were set to 1.
The structure of my real data looks exactly the same as the small example
below. After working on it for 2 days I think the
problem is that the "primary key" (test_name, group_name and id) is only
unique in the wide form. After applying the 
reshape_long function the primary key is not longer unique. I was wondering
if anyone can tell me whether the step 
from d1 -> reshape_wide -> d2 can work at all because of the non uniqueness
of d1.



library(reshape2)

library(taRifx)




reshape_long <- function(data, ids) {
	
	# Bring data into long form
	
	data_long <- melt(data, id.vars = ids, variable.name="Data_Points",
value.name="value")
	
	data_long$value <- as.numeric(data_long$value)
	
	# Remove rows were analyte value is NA
	
	data_long <- data_long[!is.na(data_long$value), ]
	
	# Resort data
	
	formula_sort <- as.formula(paste("~", paste(ids, collapse="+")))
	
	data_long <- sort(data_long, f = formula_sort)
	
	return(data_long)
	
}

reshape_wide <- function(data, ids) {
	
	# Bring data into wide form
	
	formula_wide <- as.formula(paste(paste(ids, collapse="+"), "~
Data_Points"))
	
	data_wide <- dcast(data, formula_wide)
	
	# Resort data
	
	formula_sort <- as.formula(paste("~", paste(ids, collapse="+")))
	
	data_wide <- sort(data_wide, f = formula_sort)
	
	return(data_wide)
	
}




d <- data.frame(
		
	test_name = c(rep("Test_A", 6), rep("Test_B", 6)),
		
	group_name = c(rep("Group_C", 3), rep("Group_D", 3), rep("Group_C", 3),
rep("Group_D", 3)),
		
	id = c("I1", "I2", "I3", "I4", "I5", "I6",
				
   		   "I1", "I2", "I3", "I7", "I8", "I9"),
		
	X = c(NA,NA,1,2,3,4,5,6,NA,7,8,9),
		
	Y = as.numeric(10:21),
		
	Z = c(NA,22,23,NA,24,NA,25,26,NA,27,28,29)

)

d

d1 <- reshape_long(d, ids=c("test_name", "group_name", "id"))

d1

d2 <- reshape_wide(d1, ids=c("test_name", "group_name", "id"))

d2

identical(d,d2)


--
View this message in context: http://r.789695.n4.nabble.com/Reshape-data-frame-with-dcast-and-melt-tp4484332p4484332.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list