[R] More efficient use of reshape?

David Winsemius dwinsemius at comcast.net
Thu Dec 13 18:48:44 CET 2012


On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:

> Hi all,
>
> I have played a bit with the "reshape" package and function along with
> "melt" and "cast", but I feel I still don't have a good handle on  
> how to
> use them efficiently. Below I have included a application of  
> "reshape" that
> is rather clunky and I'm hoping someone can offer advice on how to use
> reshape (or melt/cast) more efficiently.
>

You do realize that the 'reshape' function is _not_ in the reshape  
package, right? And also that the reshape package has been superseded  
by the reshape2 package?

-- 
David.

>
> #For this example I am using climate change data available on-line
>
> file <- ("
> http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv")
> clim.data <- read.csv(file, header=TRUE)
>
> library(lubridate)
> library(reshape)
>
> #I've been playing with the lubridate package a bit to work with  
> dates, but
> as the climate dataset only uses year and month I have
> #added a "day" to each entry in the "yr_mn" column and then used  
> "dym" from
> lubridate to generate the POSIXlt formatted dates in
> #a new column clim.data$date
>
> clim.data$yr_mn<-paste("01", clim.data$yr_mn, sep="")
> clim.data$date<-dym(clim.data$yr_mn)
>
> #Now to the reshape. The dataframe is in a wide format. The columns  
> GISS,
> HAD, NOAA, RSS, and UAH are all different sources
> #from which the global temperature anomaly has been calculated since  
> 1880
> (actually only 1978 for RSS and UAH). What I would like to
> #do is plot the temperature anomaly vs date and use ggplot to facet  
> by the
> different data source (GISS, HAD, etc.). Thus I need the
> #data in long format with a date column, a temperature anomaly  
> column, and
> a data source column. The code below works, but its
> #really very clunky and I'm sure I am not using these tools as  
> efficiently
> as I can.
>
> #The varying=list(3:7) specifies the columns in the dataframe that
> corresponded to the sources (GISS, etc.), though then in the resulting
> #reshaped dataframe the sources are numbered 1-5, so I have to  
> reassigned
> their names. In addition, the original dataframe has
> #additional data columns I do not want and so after reshaping I create
> another! dataframe with just the columns I need, and
> #then I have to rename them so that I can keep track of what  
> everything is.
> Whew! Not the most elegant of code.
>
> d<-reshape(clim.data, varying=list(3:7),idvar="date",
> v.names="anomaly",direction="long")
>
> d$time<-ifelse(d$time==1,"GISS",d$time)
> d$time<-ifelse(d$time==2,"HAD",d$time)
> d$time<-ifelse(d$time==3,"NOAA",d$time)
> d$time<-ifelse(d$time==4,"RSS",d$time)
> d$time<-ifelse(d$time==5,"UAH",d$time)
>
> new.data<-data.frame(d$date,d$time,d$anomaly)
> names(new.data)<-c("date","source","anomaly")
>
> I realize this is a mess, though it works. I think with just some  
> help on
> how better to work this example I'll probably get over the learning  
> hump
> and actually figure out how to use these data manipulation functions  
> more
> cleanly.
>
> Any advice or assistance would be appreciated.
> Thanks,
> Nate
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list