[R] help for reshape function

Joshua Wiley jwiley.psych at gmail.com
Fri Jun 18 08:22:14 CEST 2010


Hello,

Try this, it is based off of your sample wide format data.  I am not
quite sure how you got the 'gene1' column in your desired output data,
it looks like it is just the data from patient1, but since I was not
sure, I did not include it.

##################################
temp <- data.frame(gene=c("gene1","gene2","gene3"),
tissue=c("breast","breast","breast"),
                   patient1=c(10,20,100), patient2=c(20,40,200),
patient3=c(50,60,300))

temp.long <- reshape(data=temp, direction="long",
                     idvar="gene", ids=gene,
                     timevar="patientID", times=c(1, 2, 3),
                     v.names="value",
varying=list(patients=c("patient1","patient2","patient3")))

temp.long <- temp.long[order(temp.long$gene),]
rownames(temp.long) <- NULL

temp.long
##################################

Just as a note, the argument names in reshape() are designed for the
repeated measures to be over time, so they may seem a bit odd in your
example.

The id variable (what identifies multiple records from the same gene)
is "gene".  So, the argument idvar="gene" sets the name for the new
variable in the long format, and ids=gene specifies that the actual
values should come from the 'gene' variable from the wide format.

Next, the patients' numbers distinguish multiple records for the same
gene.  So, the variables name is timevar="patientID" and and the
actual values are times=c(1, 2, 3).  Obviously if you have more
patients, you would include a number for each one.  If their numbers
are sequential, you could leave it blank, or just use 1:n where n is
the last patient's number.

Finally we can specify what variables are time-varying (or repeated
for each gene in your case).  v.names="value" is the name for the new
variable, and varying=list(patients=c("patient1","patient2","patient3"))
specifies the names of the columns from the wide format data.  Note
that since it is a list, if you had multiple variables that varied,
you just create additional elements.

Now you have the long format data.  Then I reorder it by gene rather
than patientID and reset the row names to their default.

HTH,

Josh


On Thu, Jun 17, 2010 at 9:47 PM, xin wei <xinwei at stat.psu.edu> wrote:
>
> I am afraid that your solution is not solving the problem. it seems that
> timevar="gene" just create the followings:
>
> GENE        SAMPLE   value id
> 1.1        1 Kidney 3.69351  1
> 2.1        1 Kidney 5.42710  2
> 3.1        1 Kidney 5.26883  3
> 4.1        1 Kidney 2.88098  4
> 5.1        1 Kidney 4.68519  5
> 6.1        1 Kidney 5.92774  6 ]
>
> here the "gene" is just a empty column. I also lost the column that is
> supposed to store the header name of transposed variables in my target
> table.
>
> more suggests?
>
> thanks
> --
> View this message in context: http://r.789695.n4.nabble.com/help-for-reshape-function-tp2259286p2259706.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student
Health Psychology
University of California, Los Angeles



More information about the R-help mailing list