[R] expand.grid

Nick Sabbe nick.sabbe at ugent.be
Wed Jan 19 11:38:29 CET 2011


<slaps self in forehead/>

I appear to have misinterpreted the help: considering that it explicitly
makes note of factors, I wrongly assumed that it would use the levels of a
factor automatically. My bad.

For completeness' sake, my final solution:

getLevels<-function(vec, includeNA=FALSE, onlyOccurring=FALSE)
{
	if(onlyOccurring)
	{
		rv<-levels(factor(vec))
	}
	else
	{
		rv<-levels(vec)
	}
	#cat("levels so far: ", rv, "\n")
	if(includeNA && any(is.na(vec)))
	{
		rv<-c(rv,NA)
	}
	#cat("levels with na: ", rv, "\n")
	return(rv)
}

expand.combs<-function(dfr, includeNA=FALSE, onlyOccurring=FALSE)
{
	expand.grid(lapply(dfr, getLevels, includeNA, onlyOccurring))
}

Thx.


Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove




-----Original Message-----
From: Berwin A Turlach [mailto:berwin at maths.uwa.edu.au] 
Sent: woensdag 19 januari 2011 11:04
To: Nick Sabbe
Cc: r-help at r-project.org
Subject: Re: [R] expand.grid

G'day Nick,

On Wed, 19 Jan 2011 09:43:56 +0100
"Nick Sabbe" <nick.sabbe at ugent.be> wrote:

> Given a dataframe
> 
> dfr<-data.frame(c1=c("a", "b", NA, "a", "a"), c2=c("d", NA, "d", "e",
> "e"), c3=c("g", "h", "i", "j", "k"))
> 
> I would like to have a dataframe with all (unique) combinations of
> all the factors present.

Easy:

R> expand.grid(lapply(dfr, levels))
   c1 c2 c3
1   a  d  g
2   b  d  g
3   a  e  g
4   b  e  g
5   a  d  h
6   b  d  h
7   a  e  h
8   b  e  h
9   a  d  i
10  b  d  i
11  a  e  i
12  b  e  i
13  a  d  j
14  b  d  j
15  a  e  j
16  b  e  j
17  a  d  k
18  b  d  k
19  a  e  k
20  b  e  k


> In fact, I would like a simple solution for these two cases: given
> the three factor columns above, I would like both all _possible_
> combinations of the factor levels, and all _present_ combinations of
> the factor levels (e.g. if I would do this for the first 4 rows of
> dfr, it would contain no combinations with c3="k"). 

R> dfrpart <- lapply(dfr[1:4,], factor)
R> expand.grid(lapply(dfrpart, levels))
   c1 c2 c3
1   a  d  g
2   b  d  g
3   a  e  g
4   b  e  g
5   a  d  h
6   b  d  h
7   a  e  h
8   b  e  h
9   a  d  i
10  b  d  i
11  a  e  i
12  b  e  i
13  a  d  j
14  b  d  j
15  a  e  j
16  b  e  j

> It would also be nice to be able to choose whether or not NA's are
> included. 

R> expand.grid(lapply(dfrpart, function(x) c(levels(x),
+   if(any(is.na(x))) NA else NULL)))
     c1   c2 c3
1     a    d  g
2     b    d  g
3  <NA>    d  g
4     a    e  g
5     b    e  g
6  <NA>    e  g
7     a <NA>  g
8     b <NA>  g
9  <NA> <NA>  g
10    a    d  h
11    b    d  h
....

HTH.

Cheers,

	Berwin

========================== Full address ============================
Berwin A Turlach                      Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)            +61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway                   
Crawley WA 6009                e-mail: berwin at maths.uwa.edu.au
Australia                        http://www.maths.uwa.edu.au/~berwin



More information about the R-help mailing list