[R] unequal number of observations for longitudinal data

Chuck Cleland ccleland at optonline.net
Sat Jan 27 12:41:05 CET 2007


gallon li wrote:
> Two questions:
> 
> 1. How do I replace "NA" with 0?

df.long2$x <- replace(df.long2$x, is.na(df.long2$x), 0)

?replace

> 2. How can I sort the observations by their id instead of by time? (actually
> i can see what you produced is automatically sorted by id; but in my case,
> the output data is sorted by time)

df.long2 <- df.long2[order(df.long2$id),]

or better ...

df.long2 <- df.long2[order(row.names(df.long2)),]

df.long2
    id time         x
1.1  1    1 0.6375135
1.2  1    2 0.1651258
1.3  1    3 0.0000000
1.4  1    4 0.0000000
1.5  1    5 0.3210223
2.1  2    1 0.9878134
2.2  2    2 0.8909020
2.3  2    3 0.7747615
2.4  2    4 0.3834130
2.5  2    5 0.9853269
3.1  3    1 0.0000000
3.2  3    2 0.3586109
3.3  3    3 0.0000000
3.4  3    4 0.8310539
3.5  3    5 0.0000000

R-FAQ 7.23 How can I sort the rows of a data frame?

http://finzi.psych.upenn.edu/R/doc/manual/R-FAQ.html

> On 1/27/07, Chuck Cleland <ccleland at optonline.net> wrote:
>> gallon li wrote:
>>> i have a large longitudinal data set. The number of observations for
>> each
>>> subject is not the same across the sample. The largest number of a
>> subject
>>> is 5 and the smallest number is 1.
>>>
>>> now i want to make each subject to have the same number of observations
>> by
>>> filling zero, e.g., my original sample is
>>>
>>> id x
>>> 001 10
>>> 001 30
>>> 001 20
>>> 002 10
>>> 002 20
>>> 002 40
>>> 002 80
>>> 002 70
>>> 003 20
>>> 003 40
>>> 004 ......
>>>
>>> now i wish to make the data like
>>>
>>>  id x
>>> 001 10
>>> 001 30
>>> 001 20
>>> 001 0
>>> 001 0
>>> 002 10
>>> 002 20
>>> 002 40
>>> 002 80
>>> 002 70
>>> 003 20
>>> 003 40
>>> 003 0
>>> 003 0
>>> 003 0
>>> 004 ......
>>>
>>> so that each id has exactly 5 observations. is there a function which
>> can
>>> allow me do this quickly?
>> Filling in with zeros seems like a bad idea, but here is an approach
>> to filling in with NAs.  I will leave replacing the NAs with zeros to you.
>>
>> df.long <- data.frame(id = c(1,1,1,2,2,2,2,2,3,3), x = runif(10),
>>                      time = c(1,2,5,1,2,3,4,5,2,4))
>>
>> df.long
>>   id          x time
>> 1   1 0.72888215    1
>> 2   1 0.60893548    2
>> 3   1 0.41347690    5
>> 4   2 0.79388248    1
>> 5   2 0.05810054    2
>> 6   2 0.02451654    3
>> 7   2 0.85464775    4
>> 8   2 0.15970365    5
>> 9   3 0.22856183    2
>> 10  3 0.38291471    4
>>
>> df.wide <- reshape(df, idvar = "id", v.names = "x", direction="wide")
>>
>> df.wide
>> id       x.1       x.2       x.5       x.3       x.4
>> 1  1 0.6375135 0.1651258 0.3210223        NA        NA
>> 4  2 0.9878134 0.8909020 0.9853269 0.7747615 0.3834130
>> 9  3        NA 0.3586109        NA        NA 0.8310539
>>
>> df.long2 <- reshape(df.wide, direction="long")
>>
>> df.long2
>>    id time         x
>> 1.1  1    1 0.6375135
>> 2.1  2    1 0.9878134
>> 3.1  3    1        NA
>> 1.2  1    2 0.1651258
>> 2.2  2    2 0.8909020
>> 3.2  3    2 0.3586109
>> 1.5  1    5 0.3210223
>> 2.5  2    5 0.9853269
>> 3.5  3    5        NA
>> 1.3  1    3        NA
>> 2.3  2    3 0.7747615
>> 3.3  3    3        NA
>> 1.4  1    4        NA
>> 2.4  2    4 0.3834130
>> 3.4  3    4 0.8310539
>>
>> This assumes that your data in the "long" format has a time variable.
>> See the help page for reshape() for more details.
>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> --
>> Chuck Cleland, Ph.D.
>> NDRI, Inc.
>> 71 West 23rd Street, 8th floor
>> New York, NY 10010
>> tel: (212) 845-4495 (Tu, Th)
>> tel: (732) 512-0171 (M, W, F)
>> fax: (917) 438-0894
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894



More information about the R-help mailing list