[R] Howto build combinations of colums of a data frame

Juergen Rose rose at rz.uni-potsdam.de
Thu Apr 16 19:04:31 CEST 2009


Am Donnerstag, den 16.04.2009, 17:41 +0100 schrieb baptiste auguie:
> Perhaps,
> 
> apply(combn(letters[1:4],2), 2, paste,collapse="")
> 
> Hope this helps,

Thanks Babtiste,

I use now:

Lines <- "a    b    c    d
    13     0    15   16
    23    24    25    0   
    33    34     0   36
     0    44    45   46
    53    54     0   55"

DF <- read.table(textConnection(Lines), header = TRUE)
cnames <- colnames(DF)

cnames.new  <- apply(combn(cnames,2), 2, paste,collapse="")
pairs <- apply(combn(colnames(DF),2), 2, function(x)
DF[,x[1]]*DF[,x[2]] )
colnames(pairs) <- cnames.new
print("pairs=");   print(pairs)

cnames.new  <- apply(combn(cnames,3), 2, paste,collapse="")
tripels <- apply(combn(colnames(DF),3), 2, function(x)
DF[,x[1]]*DF[,x[2]]*DF[,x[3]])
colnames(tripels) <-  cnames.new
print("tripels=");   print(tripels)

and I am very satisfied.

Juergen

> baptiste
> On 16 Apr 2009, at 17:33, Juergen Rose wrote:
> 
> > Am Donnerstag, den 16.04.2009, 10:59 -0400 schrieb David Winsemius:
> >
> > Thanks David,
> >
> > is there also a shorter way to get the columns names of the new data
> > frames?
> >
> > Juergen
> >
> >> On Apr 16, 2009, at 10:14 AM, Juergen Rose wrote:
> >>
> >>> Hi,
> >>>
> >>> as a R-newcomer I would like to create some new data frames from a
> >>> given
> >>> data frame. The first new data frame should content all pairs of the
> >>> columns of the original data frame. The second new data frame should
> >>> content all tripels of of the columns of the original data frame and
> >>> the
> >>> last the quadrupel of columns. The values in the new data frames
> >>> should
> >>> be the product of two, three our four original single field values.
> >>> For
> >>> pairs and tripels I could realize that task, with the following R
> >>> script:
> >>>
> >>> Lines <- "a    b    c    d
> >>>   13     0    15   16
> >>>   23    24    25    0
> >>>   33    34     0   36
> >>>    0    44    45   46
> >>>   53    54     0   55"
> >>>
> >>> DF <- read.table(textConnection(Lines), header = TRUE)
> >>>
> >>> nrow <-length(rownames(DF))
> >>> cnames <- colnames(DF)
> >>> nc <-length(DF)
> >>>
> >>> nc.pairs <- nc*(nc-1)/2
> >>> #  initialize vector
> >>> cnames.new <- c(rep("",nc.pairs))
> >>> ind <- 1
> >>> print(sprintf("nc=%d",nc))
> >>> for (i in 1:(nc-1)) {
> >>> if (i+1 <= nc ) {
> >>>   for (j in (i+1):nc) {
> >>>     cnames.new[ind] <- paste(cnames[i],cnames[j],sep="")
> >>>     ind <- ind+1
> >>>   }
> >>> }
> >>> }
> >>>
> >>> ind <- 1
> >>> #  initialize data.frame
> >>> pairs <- data.frame(matrix(c(rep(0,nc.pairs*nrow)),ncol=nc.pairs))
> >>> for (i in 1:nc) {
> >>> if (i+1 <= nc ) {
> >>>   for (j in (i+1):nc) {
> >>>     t <- DF[,i] * DF[,j]
> >>>     pairs[[ind]] <- t
> >>>     ind <- ind+1
> >>>   }
> >>> }
> >>> }
> >>> colnames(pairs) <- cnames.new
> >>> print("pairs=");   print(pairs)
> >>
> >> apply(combn(colnames(DF),2), 2, function(x) DF[,x[1]]*DF[,x[2]] )
> >>      [,1] [,2] [,3] [,4] [,5] [,6]
> >> [1,]    0  195  208    0    0  240
> >> [2,]  552  575    0  600    0    0
> >> [3,] 1122    0 1188    0 1224    0
> >> [4,]    0    0    0 1980 2024 2070
> >> [5,] 2862    0 2915    0 2970    0
> >>>
> >>>
> >>> nc.tripels <- nc*(nc-1)*(nc-2)/6
> >>> #  initialize vector
> >>> cnames.new <- c(rep("",nc.tripels))
> >>> ind <- 1
> >>> print(sprintf("nc=%d",nc))
> >>> for (i in 1:nc) {
> >>> if (i+1 <= nc ) {
> >>>   for (j in (i+1):nc) {
> >>>     if (j+1 <= nc ) {
> >>>       for (k in (j+1):nc) {
> >>>         cnames.new[ind] <-
> >>> paste(cnames[i],cnames[j],cnames[k],sep="")
> >>>         ind <- ind+1
> >>>       }
> >>>     }
> >>>   }
> >>> }
> >>> }
> >>>
> >>> ind <- 1
> >>> #  initialize data.frame
> >>> tripels <-
> >>> data.frame(matrix(c(rep(0,nc.tripels*nrow)),ncol=nc.tripels))
> >>> for (i in 1:(nc-1)) {
> >>> if (i+1 <= nc ) {
> >>>   for (j in (i+1):nc) {
> >>>     if (j+1 <= nc ) {
> >>>       for (k in (j+1):nc) {
> >>>         t <- DF[,i] * DF[,j] * DF[,k]
> >>>         tripels[[ind]] <- t
> >>>         ind <- ind+1
> >>>       }
> >>>     }
> >>>   }
> >>> }
> >>> }
> >>> colnames(tripels) <-  cnames.new
> >>> print("tripels=");   print(tripels)
> >>
> >>> apply(combn(colnames(DF),3), 2, function(x)
> >> DF[,x[1]]*DF[,x[2]]*DF[,x[3]])
> >>       [,1]   [,2] [,3]  [,4]
> >> [1,]     0      0 3120     0
> >> [2,] 13800      0    0     0
> >> [3,]     0  40392    0     0
> >> [4,]     0      0    0 91080
> >> [5,]     0 157410    0     0
> >>
> >>>
> >>>
> >>> I suppose that here is a much shorter way to get the same results.  
> >>> Any
> >>> hint is very much appreciated.
> >>
> >> David Winsemius, MD
> >> Heritage Laboratories
> >> West Hartford, CT
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> _____________________________
> 
> Baptiste Auguié
> 
> School of Physics
> University of Exeter
> Stocker Road,
> Exeter, Devon,
> EX4 4QL, UK
> 
> Phone: +44 1392 264187
> 
> http://newton.ex.ac.uk/research/emag
> ______________________________
> 




More information about the R-help mailing list