[R] Howto build combinations of colums of a data frame

Juergen Rose rose at rz.uni-potsdam.de
Thu Apr 16 16:14:26 CEST 2009


Hi,

as a R-newcomer I would like to create some new data frames from a given
data frame. The first new data frame should content all pairs of the
columns of the original data frame. The second new data frame should
content all tripels of of the columns of the original data frame and the
last the quadrupel of columns. The values in the new data frames should
be the product of two, three our four original single field values. For
pairs and tripels I could realize that task, with the following R
script:

Lines <- "a    b    c    d
    13     0    15   16
    23    24    25    0   
    33    34     0   36
     0    44    45   46
    53    54     0   55"

DF <- read.table(textConnection(Lines), header = TRUE)

nrow <-length(rownames(DF))
cnames <- colnames(DF)
nc <-length(DF)

nc.pairs <- nc*(nc-1)/2
#  initialize vector
cnames.new <- c(rep("",nc.pairs))
ind <- 1
print(sprintf("nc=%d",nc))
for (i in 1:(nc-1)) {
  if (i+1 <= nc ) {
    for (j in (i+1):nc) {
      cnames.new[ind] <- paste(cnames[i],cnames[j],sep="")
      ind <- ind+1
    }
  }
}

ind <- 1
#  initialize data.frame
pairs <- data.frame(matrix(c(rep(0,nc.pairs*nrow)),ncol=nc.pairs))
for (i in 1:nc) {
  if (i+1 <= nc ) {
    for (j in (i+1):nc) {
      t <- DF[,i] * DF[,j]
      pairs[[ind]] <- t
      ind <- ind+1
    }
  }
}
colnames(pairs) <- cnames.new
print("pairs=");   print(pairs)

nc.tripels <- nc*(nc-1)*(nc-2)/6
#  initialize vector
cnames.new <- c(rep("",nc.tripels))
ind <- 1
print(sprintf("nc=%d",nc))
for (i in 1:nc) {
  if (i+1 <= nc ) {
    for (j in (i+1):nc) {
      if (j+1 <= nc ) {
        for (k in (j+1):nc) {
          cnames.new[ind] <- paste(cnames[i],cnames[j],cnames[k],sep="")
          ind <- ind+1
        }
      }
    }
  }
}

ind <- 1
#  initialize data.frame
tripels <- data.frame(matrix(c(rep(0,nc.tripels*nrow)),ncol=nc.tripels))
for (i in 1:(nc-1)) {
  if (i+1 <= nc ) {
    for (j in (i+1):nc) {
      if (j+1 <= nc ) {
        for (k in (j+1):nc) {
          t <- DF[,i] * DF[,j] * DF[,k]
          tripels[[ind]] <- t
          ind <- ind+1
        }
      }
    }
  }
}
colnames(tripels) <-  cnames.new
print("tripels=");   print(tripels)

I suppose that here is a much shorter way to get the same results. Any
hint is very much appreciated.

Regards




More information about the R-help mailing list