[R] Howto build combinations of colums of a data frame

Juergen Rose rose at rz.uni-potsdam.de
Thu Apr 16 18:33:07 CEST 2009


Am Donnerstag, den 16.04.2009, 10:59 -0400 schrieb David Winsemius:

Thanks David,

is there also a shorter way to get the columns names of the new data
frames?

Juergen

> On Apr 16, 2009, at 10:14 AM, Juergen Rose wrote:
> 
> > Hi,
> >
> > as a R-newcomer I would like to create some new data frames from a  
> > given
> > data frame. The first new data frame should content all pairs of the
> > columns of the original data frame. The second new data frame should
> > content all tripels of of the columns of the original data frame and  
> > the
> > last the quadrupel of columns. The values in the new data frames  
> > should
> > be the product of two, three our four original single field values.  
> > For
> > pairs and tripels I could realize that task, with the following R
> > script:
> >
> > Lines <- "a    b    c    d
> >    13     0    15   16
> >    23    24    25    0
> >    33    34     0   36
> >     0    44    45   46
> >    53    54     0   55"
> >
> > DF <- read.table(textConnection(Lines), header = TRUE)
> >
> > nrow <-length(rownames(DF))
> > cnames <- colnames(DF)
> > nc <-length(DF)
> >
> > nc.pairs <- nc*(nc-1)/2
> > #  initialize vector
> > cnames.new <- c(rep("",nc.pairs))
> > ind <- 1
> > print(sprintf("nc=%d",nc))
> > for (i in 1:(nc-1)) {
> >  if (i+1 <= nc ) {
> >    for (j in (i+1):nc) {
> >      cnames.new[ind] <- paste(cnames[i],cnames[j],sep="")
> >      ind <- ind+1
> >    }
> >  }
> > }
> >
> > ind <- 1
> > #  initialize data.frame
> > pairs <- data.frame(matrix(c(rep(0,nc.pairs*nrow)),ncol=nc.pairs))
> > for (i in 1:nc) {
> >  if (i+1 <= nc ) {
> >    for (j in (i+1):nc) {
> >      t <- DF[,i] * DF[,j]
> >      pairs[[ind]] <- t
> >      ind <- ind+1
> >    }
> >  }
> > }
> > colnames(pairs) <- cnames.new
> > print("pairs=");   print(pairs)
> 
> apply(combn(colnames(DF),2), 2, function(x) DF[,x[1]]*DF[,x[2]] )
>       [,1] [,2] [,3] [,4] [,5] [,6]
> [1,]    0  195  208    0    0  240
> [2,]  552  575    0  600    0    0
> [3,] 1122    0 1188    0 1224    0
> [4,]    0    0    0 1980 2024 2070
> [5,] 2862    0 2915    0 2970    0
> >
> >
> > nc.tripels <- nc*(nc-1)*(nc-2)/6
> > #  initialize vector
> > cnames.new <- c(rep("",nc.tripels))
> > ind <- 1
> > print(sprintf("nc=%d",nc))
> > for (i in 1:nc) {
> >  if (i+1 <= nc ) {
> >    for (j in (i+1):nc) {
> >      if (j+1 <= nc ) {
> >        for (k in (j+1):nc) {
> >          cnames.new[ind] <-  
> > paste(cnames[i],cnames[j],cnames[k],sep="")
> >          ind <- ind+1
> >        }
> >      }
> >    }
> >  }
> > }
> >
> > ind <- 1
> > #  initialize data.frame
> > tripels <-  
> > data.frame(matrix(c(rep(0,nc.tripels*nrow)),ncol=nc.tripels))
> > for (i in 1:(nc-1)) {
> >  if (i+1 <= nc ) {
> >    for (j in (i+1):nc) {
> >      if (j+1 <= nc ) {
> >        for (k in (j+1):nc) {
> >          t <- DF[,i] * DF[,j] * DF[,k]
> >          tripels[[ind]] <- t
> >          ind <- ind+1
> >        }
> >      }
> >    }
> >  }
> > }
> > colnames(tripels) <-  cnames.new
> > print("tripels=");   print(tripels)
> 
>  > apply(combn(colnames(DF),3), 2, function(x)  
> DF[,x[1]]*DF[,x[2]]*DF[,x[3]])
>        [,1]   [,2] [,3]  [,4]
> [1,]     0      0 3120     0
> [2,] 13800      0    0     0
> [3,]     0  40392    0     0
> [4,]     0      0    0 91080
> [5,]     0 157410    0     0
> 
> >
> >
> > I suppose that here is a much shorter way to get the same results. Any
> > hint is very much appreciated.
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>




More information about the R-help mailing list