[R] Wide to long form conversion

David Winsemius dwinsemius at comcast.net
Fri Oct 7 19:37:53 CEST 2011


On Oct 7, 2011, at 1:30 PM, David Winsemius wrote:

>
> On Oct 7, 2011, at 7:40 AM, Gang Chen wrote:
>
>> Jim, I really appreciate your help!
>>
>> I like the power of rep_n_stack, but how can I use rep_n_stack to get
>> the following result?
>>
>> Subj Group value Ref Var Time
>> 1    S1     s     4  Me   F    1
>> 2    S1     s     3  Me   F    2
>> 3    S1     s     5  Me   J    1
>> 4    S1     s     6  Me   J    2
>> 5    S1     s     6 She   F    1
>> 6    S1     s     6 She   F    2
>> 7    S1     s    10 She   J    1
>> 8    S1     s     9 She   J    2
>
> I was not able to construct a one step solution with `reshape` that  
> will contains all the columns. You can do it in about 4 steps by  
> first making the data "long" and then adding annotation columns.  
> Using just rows 1 and 26 you might get:
>
> reshape(myData[c(1,26), ], idvar=c("Group","Subj"),
>       direction="long",
>       varying=2:9,
>       v.names=c("value") )
>        Group Subj time value
> s.S1.1      s   S1    1      4
> w.S26.1     w  S26    1      5
> s.S1.2      s   S1    2      5
> w.S26.2     w  S26    2      9
> s.S1.3      s   S1    3      6
> w.S26.3     w  S26    3      4
> s.S1.4      s   S1    4     10
> w.S26.4     w  S26    4      7
> s.S1.5      s   S1    5      3
> w.S26.5     w  S26    5      3
> s.S1.6      s   S1    6      6
> w.S26.6     w  S26    6      7
> s.S1.7      s   S1    7      6
> w.S26.7     w  S26    7      3
> s.S1.8      s   S1    8      9
> w.S26.8     w  S26    8      5
>
> The 'time' variable is not really what you wanted but refers to the  
> sequence along the original wide column names
> You can add the desired  Ref, Var and Time columms with these  
> constructions:
>
> > str(times<-rep(c(1,2), length=nrow(myData)*8 )  )
> num [1:408] 1 2 1 2 1 2 1 2 1 2 ...
> > str(times<-rep(c("F","J"), each=2, length=nrow(myData)*8 )  )
> chr [1:408] "F" "F" "J" "J" "F" "F" "J" "J" "F" "F" ...
> > str(times<-rep(c("Me","She"), each=4, length=nrow(myData)*8 )  )
> chr [1:408] "Me" "Me" "Me" "Me" "She" "She" "She" "She" ...
>
It occured to me that the ordering operation probably should have  
preceded teh ancillary column creation so this method is tested:

> longData <- reshape(myData, idvar=c("Group","Subj"),
>        direction="long",    #fixed the direction argument
>       varying=2:9,
>       v.names=c("value") )
> longData <- longData[order(longData$Subj), ]
> longData$Time <- rep(c(1,2), length=nrow(myData)*8 )
> longData$Var <- rep(c("F","J"), each=2, length=nrow(myData)*8 )
> longData$Ref <- rep(c("Me","She"), each=4, length=nrow(myData)*8 )
>
        Group Subj time value Time Var Ref
s.S1.1     s   S1    1     4    1   F  Me
s.S1.2     s   S1    2     5    2   F  Me
s.S1.3     s   S1    3     6    1   J  Me
s.S1.4     s   S1    4    10    2   J  Me
s.S1.5     s   S1    5     3    1   F She
s.S1.6     s   S1    6     6    2   F She
s.S1.7     s   S1    7     6    1   J She
s.S1.8     s   S1    8     9    2   J She


>
> Looking at Jim Lemon's response, I think he just misinterpreted the  
> structure of your data but gave you a perfectly usable response. You  
> could have done much the same thing with a minor modification:
>
> >  
> str(rep_n_stack(myData,matrix(c(2,3,6,7,4,5,8,9),nrow=1,byrow=TRUE)))
> 'data.frame':	408 obs. of  4 variables:
> $ Group : Factor w/ 2 levels "s","w": 1 1 1 1 1 1 1 1 1 1 ...
> $ Subj  : Factor w/ 51 levels "S1","S10","S11",..: 1 12 23 34 45 48  
> 49 50 51 2 ...
> $ group1: Factor w/ 8 levels "Me.F.1","Me.F.2",..: 1 1 1 1 1 1 1 1 1  
> 1 ...
> $ value1: int  4 6 7 8 10 5 13 8 6 14 ...
>
> Now you can just split apart the 'group1' column with sub() to make  
> the three specified columns.

Lemon's method has the advantage that it properly carries along the  
column information

> -- 
> David.
>
>>
>> On Fri, Oct 7, 2011 at 7:16 AM, Jim Lemon <jim at bitwrit.com.au> wrote:
>>> On 10/07/2011 07:28 AM, Gang Chen wrote:
>>>>
>>>> I have some data 'myData' in wide form (attached at the end), and
>>>> would like to convert it to long form. I wish to have five  
>>>> variables
>>>> in the result:
>>>>
>>>> 1) Subj: factor
>>>> 2) Group: between-subjects factor (2 levels: s / w)
>>>> 3) Reference: within-subject factor (2 levels: Me / She)
>>>> 4) F: within-subject factor (2 levels: F1 / F2)
>>>> 5) J: within-subject factor (2 levels: J1 / J2)
>>>
>>> Hi Gang,
>>> I don't know whether this is the format you want, but:
>>>
>>> library(prettyR)
>>> rep_n_stack(mydata,matrix(c(2,3,6,7,4,5,8,9),nrow=2,byrow=TRUE))
>>>
>>> Jim
>>>
>>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list