[R] How to 'extend' a data.frame based on given variable combinations ?

arun smartpink111 at yahoo.com
Mon Mar 11 15:17:20 CET 2013


HI,

Not sure whether it helps or not.

You could use ?merge()

 dat1<-as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length)),stringsAsFactors=FALSE)

dat2<-expand.grid(group=LETTERS[1:2],year=2001:2005)
names(dat1)[1:2]<- names(dat2)
res<-merge(dat1,dat2,by=c("group","year"),all=TRUE)

res[is.na(res)]<-0

A.K.

----- Original Message -----
From: Marius Hofert <marius.hofert at math.ethz.ch>
To: R-help <r-help at r-project.org>
Cc: 
Sent: Monday, March 11, 2013 8:59 AM
Subject: [R] How to 'extend' a data.frame based on given variable combinations ?

Dear expeRts,

I have a data.frame with certain covariate combinations ('group' and 'year')
and corresponding values:

set.seed(1)
x <- data.frame(group = c(rep("A", 4), rep("B", 3)),
                year  = c(2001,      2003, 2004, 2005,
                                     2003, 2004, 2005),
                value = rexp(7))

My goal is essentially to construct a data.frame which contains all (group, year)
combinations with corresponding number of values. This can easily be done with tapply():

as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=length))) # => 2002 missing

However, the tricky part is now that I would like to have *all* years in between 2001 and 2005.
Although tapply() sees the missing year 2001 for group "B" (since group "A" has a value there),
tapply() does not 'see' the missing year 2002. 

How can such a data.frame be constructed [ideally without using additional R packages]?

Here is a straightforward way (hopelessly inefficient for the application in mind):

num <- cbind(expand.grid(group = LETTERS[1:2], year=2001:2005), num=0)
covar <- c("group", "year")
for(i in seq_len(nrow(num)))
    num[i,"num"] <- sum(apply(x[,covar], 1, function(z) all(z == num[i,covar])))
num

Cheers,

Marius

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list