[R] reshape

arun smartpink111 at yahoo.com
Thu Jun 21 05:48:23 CEST 2012



Hi Toby,

Since your original intention is to get separate columns for VWC and Tair, this code might be helpful to you if the missing values are similar in other excel sheets. 

toymelt1<-melt(toy,id=c("year","month","day","hhmm","plot","trt"))
#jim's code
toymelt1$type <- sub("\\..*$", "", toymelt1$variable)
toymelt1$group <- sub(".*\\.(.).*", "\\1", toymelt1$variable)
toymelt1$number <- sub(".*\\..", "", toymelt1$variable)

toymeltVWC<-subset(toymelt1,type=="VWC")
toymeltTair<-subset(toymelt1,type=="Tair")
toymeltTair2<-toymeltTair[,-7]
toymeltVWC2<-toymeltVWC[,-7]
toymeltVWC2<-toymeltVWC2[,c(1:6,8:10,7)]
toymeltTair2<-toymeltTair2[,c(1:6,8:10,7)]

dat2<-data.frame(year=rep(2008,40),month=rep(rep(c(1,2,3,4,5),each=4),2),day=rep(c(1,1,2,2),10),hhmm=rep(c(1100,1101),20),plot=rep(1,40),trt=rep(4,40),type=rep("Tair",40),group=rep("T",40),number=rep(10,40),value=rep(NA,40))
toymeltTair3<-data.frame(rbind(toymeltTair2,dat2))
toynew<-data.frame(toymeltVWC2[,1:10],toymeltTair3[,c(7,10)])
colnames(toynew)<-c(colnames(toymeltVWC2),"type2","value2")

It's kind of an ugly way of arriving at the result and it may not be helpful in real data.
A.K.





----- Original Message -----
From: jim holtman <jholtman at gmail.com>
To: nrm2010 <nrm2010 at zoho.com>
Cc: r-help <r-help at r-project.org>
Sent: Wednesday, June 20, 2012 7:58 PM
Subject: Re: [R] reshape

Something got lost in the transmission since the '\\' were replaced
with '\'  I will try gain with the data: (also attached as a text
file)


require(reshape2)
toy <- data.frame(year = rep(2007:2008,each = 20), month =
rep(1:5,each = 4, length = 40),
day = rep(1:2,each = 2,length = 40), hhmm = rep(1100:1101,length =
40), plot = rep(1,40),
trt=rep(4,40), VWC.B1 = rnorm(40), VWC.B2 = rnorm(40), VWC.T1 = rnorm(40),
VWC.T2 = rnorm(40), VWC.T10 = rnorm(40), Tair.B1 = rnorm(40),Tair.B2=
rnorm(40),
Tair.T1 = rnorm(40), Tair.T2 = rnorm(40))

x <- melt(toy, id = c('year', 'month', 'day', 'hhmm', 'plot', 'trt'))

# parse the type, group and number
x$type <- sub("\\..*$", "", x$variable)
x$group <- sub(".*\\.(.).*", "\\1", x$variable)
x$number <- sub(".*\\..", "", x$variable)

head(x)



On Wed, Jun 20, 2012 at 4:08 PM, nrm2010 <nrm2010 at zoho.com> wrote:
> Still working on it...to get your original suggestion to run,
> I "escaped" (doubled) the backslashes, so instead of
> x$type <- sub("\..*$", "", x$variable)
> which generated the error message
> Error: '\.' is an unrecognized escape in character string starting "\."
> I tried
> x$type <- sub("\\..*$", "", x$variable)
>
> That did not generate the error message and allowed me to
> run your second suggestion:
> dcast(x, year+month+day+hhmm+group+number ~ type, mean)
> The Tair and VWC columns appeared but the groups all came out as \001, instead of B or T.
>    year month day hhmm group number         Tair          VWC
> 1   2007     1   1 1100  \001      1  1.553844979  0.325459714
> 2   2007     1   1 1100  \001     10          NaN -0.693606661
>
>
> I'm also not trying to take the mean, but only to reformat from wide to long,
> so I tried it without the "mean" at the end of the dcast statement:
> dcast(x, year+month+day+hhmm+group+number ~ type)
> That resulted in a loss of the measurement data -
> the VWC and Tair columns are still there but they contain only 0's, 1's, or 2's.
>
>   year month day hhmm group number Tair VWC
> 1   2007     1   1 1100  \001      1    2   2
> 2   2007     1   1 1100  \001     10    0   1
> 3   2007     1   1 1100  \001      2    2   2
>
> So, I'm still working on it...
>
> Thanks,
>
> Toby
>
> R v. 2.15.0
> Windows OS
>
>
>
>
>
>
> ---- On Wed, 20 Jun 2012 09:51:45 -0700 jim holtman <jholtman at gmail.com> wrote ----
>  > Is this closer using the same data:
>  >
>  > > dcast(x, year+month+day+hhmm+group+number ~ type, mean)
>  >     year month day hhmm group number         Tair           VWC
>  > 1   2007     1   1 1100     B      1  0.285404913  1.4337114478
>  > 2   2007     1   1 1100     B      2 -0.899893189  1.1004122756
>  > 3   2007     1   1 1100     T      1 -0.054938988  0.0565570314
>  > 4   2007     1   1 1100     T     10          NaN -0.4645357924
>  > 5   2007     1   1 1100     T      2  0.235724005  0.1355795323
>  > 6   2007     1   1 1101     B      1  0.008141599  0.2010562273
>  > 7   2007     1   1 1101     B      2 -0.094439009  0.7713080532
>  > 8   2007     1   1 1101     T      1  1.111981883 -0.5022737361
>  > 9   2007     1   1 1101     T     10          NaN  0.3061178598
>  > 10  2007     1   1 1101     T      2  0.101879382 -0.1886613029
>  > 11  2007     1   2 1100     B      1 -2.189530590  0.0741563102
>  > 12  2007     1   2 1100     B      2  0.290204966  1.4341878509
>  > 13  2007     1   2 1100     T      1  0.788716089  1.9156394486
>  > 14  2007     1   2 1100     T     10          NaN  1.9814202666
>  > 15  2007     1   2 1100     T      2  0.255978008  0.0604626563
>  > 16  2007     1   2 1101     B      1 -0.479254067  0.7770438119
>  > 17  2007     1   2 1101     B      2  0.526466448 -0.1691557783
>  > 18  2007     1   2 1101     T      1  2.070198634  0.1129769772
>  > 19  2007     1   2 1101     T     10          NaN  1.5263077972
>  > 20  2007     1   2 1101     T      2 -0.883571883 -0.5357702486
>  >
>  >
>  > On Wed, Jun 20, 2012 at 12:33 PM, nrm2010 <nrm2010 at zoho.com> wrote:
>  > >
>  > > Thank you, but it doesn't quite.  It still needs a new column for each measurement type,
>  > > i.e., a VWC column that contains the VWC measurements for each subject within
>  > > group at each time point and a Tair column that contains the
>  > > Tair measurements for each subject within group at each time point.
>  > >
>  > > Toby
>  > >
>  > >
>  > >
>  > >
>  > > ---- On Wed, 20 Jun 2012 09:25:08 -0700 jim holtman  wrote ----
>  > >
>  > >>will this do it for you:
>  > >>
>  > >>
>  > >>> require(reshape2)
>  > >>> toy <- data.frame(year = rep(2007:2008,each = 20), month = rep(1:5,each = 4, length = 40),
>  > >>+ day = rep(1:2,each = 2,length = 40), hhmm = rep(1100:1101,length =
>  > >>40), plot = rep(1,40),
>  > >>+ trt=rep(4,40), VWC.B1 = rnorm(40), VWC.B2 = rnorm(40), VWC.T1 = rnorm(40),
>  > >>+ VWC.T2 = rnorm(40), VWC.T10 = rnorm(40), Tair.B1 = rnorm(40),Tair.B2
>  > >>= rnorm(40),
>  > >>+ Tair.T1 = rnorm(40), Tair.T2 = rnorm(40))
>  > >>>
>  > >>> x <- melt(toy, id = c('year', 'month', 'day', 'hhmm', 'plot', 'trt'))
>  > >>>
>  > >>> # parse the type, group and number
>  > >>> x$type <- sub("\..*$", "", x$variable)
>  > >>> x$group <- sub(".*\.(.).*", "\1", x$variable)
>  > >>> x$number <- sub(".*\..", "", x$variable)
>  > >>>
>  > >>> head(x)
>  > >> year month day hhmm plot trt variable value type group number
>  > >>1 2007 1 1 1100 1 4 VWC.B1 0.35680152 VWC B 1
>  > >>2 2007 1 1 1101 1 4 VWC.B1 -0.01787670 VWC B 1
>  > >>3 2007 1 2 1100 1 4 VWC.B1 -0.07361414 VWC B 1
>  > >>4 2007 1 2 1101 1 4 VWC.B1 0.90117558 VWC B 1
>  > >>5 2007 2 1 1100 1 4 VWC.B1 0.64824677 VWC B 1
>  > >>6 2007 2 1 1101 1 4 VWC.B1 0.26769079 VWC B 1
>  > >>>
>  > >>
>  > >>
>  > >>On Wed, Jun 20, 2012 at 12:08 PM, nrm2010  wrote:
>  > >>> Hello, helpeRs,
>  > >>>
>  > >>> I am attempting to reshape (either base R or package reshape) multiple .csv
>  > >>> spreadsheets from a very unfortunate wide format to long format.  Each spreadsheet
>  > >>> looks something like this, after being read in to R:
>  > >>>
>  > >>> toy <- data.frame(year = rep(2007:2008,each = 20), month = rep(1:5,each = 4, length = 40),
>  > >>> day = rep(1:2,each = 2,length = 40), hhmm = rep(1100:1101,length = 40), plot = rep(1,40),
>  > >>> trt=rep(4,40), VWC.B1 = rnorm(40), VWC.B2 = rnorm(40), VWC.T1 = rnorm(40),
>  > >>> VWC.T2 = rnorm(40), VWC.T10 = rnorm(40), Tair.B1 = rnorm(40),Tair.B2 = rnorm(40),
>  > >>> Tair.T1 =  rnorm(40), Tair.T2 = rnorm(40))
>  > >>>
>  > >>> Taken together, the columns year, month, day, and hhmm define a timepoint
>  > >>> at which at a measurement was taken.  Plot and treatment do not change within
>  > >>> any one spreadsheet.  Parsing the column names, VWC and Tair are 2 different
>  > >>> types of measurements; in the full dataset, there are more than 2 types of
>  > >>> measurements.  The B and T in the column names are grouping variables.  The
>  > >>> numbers following the B or T are subjects within groups.  So, if a column name
>  > >>> is VWC.B1, it is the VWC measurement taken on subject 1 within group B.
>  > >>>
>  > >>> I am trying to reshape to long format. The long format should have one new column
>  > >>> for each measurement type (VWC and Tair in the toy example), a new column
>  > >>> indicating the grouping variable (B or T), and a new column identifyng the
>  > >>> subject (the number that follow the B or T). I've been unable to come up with
>  > >>> a solution that produces the group and subject within group columns as well
>  > >>> as both types of measurements.
>  > >>>
>  > >>> Any assistance would be greatly appreciated.
>  > >>>
>  > >>> Thank you.
>  > >>>
>  > >>> Toby
>  > >>>
>  > >>> Toby Gass, Ph.D.
>  > >>> Visiting Lecturer, Forestry and Wildland Resources
>  > >>> Humboldt State University
>  > >>> email: toby.gass  humboldt  edu
>  > >>>
>  > >>> ______________________________________________
>  > >>> R-help at r-project.org mailing list
>  > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>  > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  > >>> and provide commented, minimal, self-contained, reproducible code.
>  > >>
>  > >>
>  > >>
>  > >>--
>  > >>Jim Holtman
>  > >>Data Munger Guru
>  > >>
>  > >>What is the problem that you are trying to solve?
>  > >>Tell me what you want to do, not how you want to do it.
>  > >>
>  > >
>  >
>  >
>  >
>  > --
>  > Jim Holtman
>  > Data Munger Guru
>  >
>  > What is the problem that you are trying to solve?
>  > Tell me what you want to do, not how you want to do it.
>  >
>  >
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list