[R] expand.grid overflows?

Adrian Dusa dusa.adrian at gmail.com
Sun Nov 18 14:31:22 CET 2007


On Friday 16 November 2007, francogrex wrote:
> >cbn<-as.matrix(expand.grid( rep( list(0:1), 50)))
>
> Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
>   invalid 'times' value
> In addition: Warning message:
> In rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
>   NAs introduced by coercion
>
> But I'm only interested in cbn matrix rows where:
> cbn<- cbn[rowSums(cbn)==5,]
>
> Is there a way to evaluate it row by row and only store where the sum is
> equal to 5, maybe it reduces cost of computation?

What you want is impossible: a matrix with all possible binary combinations of 
50 columns is a matrix with 2^50x50 elements, which is:

> 2^50*50
[1] 5.6295e+16

By comparison, a matrix with 20 columns requires a space of 160MB, with 21 
columns it needs approx. 330MB of RAM (see ?object.size) and everything goes 
up exponentially at the powers of 2. There is simply no way you will ever 
create a matrix with 50 columns.

There is a function in package QCA called createMatrix() that creates a 
numerical matrix faster than expand.grid()

library(QCA)
cbn <- createMatrix(rep(2, 20))

# then what you want is 
cbn <- cbn[rowSums(cbn) == 5, ]



For more than 20 variables it _is_ possible to get what you want sacrificing 
speed for a low memory consumption, this way:

library(QCA)
nofcolumns <- 25
cbn.rownos <- seq(2^nofcolumns) # generate the row numbers
eq5 <- sapply(cbn.rownos, function(x) {
    return(sum(getRow(rep(2, nofcolumns), x)) == 5)
})

# this will be _very_ slow, as it checks each row number (in its binary
# equivalent, see ?getRow) if it's sum is equal to 5

# then what you want is:

cbn <- getRow(rep(2, nofcolumns), cbn.rownos[eq5])


I hope it helps,
Adrian


-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
          +40 21 3120210 / int.101



More information about the R-help mailing list