[R] Autofilling a large matrix in R

Pieter Schoonees schoonees at ese.eur.nl
Sat Oct 13 01:56:32 CEST 2012


I think the issue is that the with expand.grid and times >= 4 you are likely to run out of memory before subscripting (at least on my machine). 

A simplification is to realize that you are looking for points in a lattice in the interior of a (p - 1)-dimensional simplex for p columns/factors/groups. 

As a start the xsimplex() function in the combinat package generates all the points in such a simplex which sums to a specific value (and nsimplex() calculates the number). 

If you then still want to remove the instances on the edges of the simplex (where one of the percentages is 0), at least you have a more memory efficient base within which to search.

For p = 4 then you will start with 

> require(combinat)
> nsimplex(4,100)
[1] 176851

candidate points instead of 

> 100^4
[1] 1e+08

points.

As an example, to generate all combinations for 4 factors excluding any 0's, you could do

> mat <- xsimplex(4,100)

> ncol(mat)
[1] 176851

> print(object.size(mat),unit="Mb")
5.4 Mb

> mat <- mat[,apply(mat,2,function(x)!any(x==0))]

> ncol(mat)
[1] 156849

Of course the curse of dimensionality will still get you as the number of factors increases. E.g.

> mat <- xsimplex(5,100)

> ncol(mat)
[1] 4598125

> print(object.size(mat),unit="Mb")
175.4 Mb

which is still manageable (but for p = 6 your lattice has nearly 100 million points).

Perhaps you can modify the code of xsimplex to automatically discard zeros.

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Rui Barradas
> Sent: Friday, October 12, 2012 18:04
> To: wwreith
> Cc: r-help at r-project.org
> Subject: Re: [R] Autofilling a large matrix in R
> 
> Hello,
> 
> Something like this?
> 
> g[rowSums(g) == 100, ]
> 
> Hope this helps,
> 
> Rui Barradas
> Em 12-10-2012 15:30, wwreith escreveu:
> > I wish to create a matrix of all possible percentages with two decimal
> > place percision. I then want each row  to sum to 100%. I started with
> > the code below with the intent to then subset the data based on the
> > row sum. This works great for 2 or 3 columns, but if I try 4 or more
> > columns the number of rows become to large. I would like to find a way
> > to break it down into some kind of for loop, so that I can remove the
> > rows that don't sum to 100% inside the for loop rather than outside
> > it. My first thought was to take list from 1:10, 11:20, etc. but that does not
> get all of the points.
> >
> > g<-as.matrix(expand.grid(rep(list(1:100), times=3)))
> >
> > Any thoughts how to split this into pieces?
> >
> >
> >
> > --
> > View this message in context:
> > http://r.789695.n4.nabble.com/Autofilling-a-large-matrix-in-R-tp464599
> > 1.html Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list