[R] Comparing elements for equality

Carlos J. Gil Bellosta cgb at datanalytics.com
Tue Jan 13 20:54:54 CET 2009


Hello,

You could build your output dataframe along the following lines:

foo <- function(x) length( unique(x) ) == 1

results <- data.frame(
	freq = tapply( dat$id,   dat$id, length ),
	var1 = tapply( dat$var1, dat$id, foo ),
	var2 = tapply( dat$var2, dat$id, foo )
)

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com


On Tue, 2009-01-13 at 14:17 -0500, Doran, Harold wrote:
> Suppose I have a dataframe as follows:
> 
> dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 =
> c('foo', 'foo', 'foo', 'foobar', 'foo'))
> 
> Now, if I were to subset by id, such as:
> 
> > subset(dat, id==1)
>   id var1 var2
> 1  1   10  foo
> 2  1   10  foo
> 
> I can see that the elements in var1 are exactly the same and the
> elements in var2 are exactly the same. However,
> 
> > subset(dat, id==2)
>   id var1   var2
> 3  2   20    foo
> 4  2   20 foobar
> 5  2   25    foo
> 
> Shows the elements are not the same for either variable in this
> instance. So, what I am looking to create is a data frame that would be
> like this
> 
> id	freq	var1	var2
> 1	2	TRUE	TRUE	
> 2	3	FALSE	FALSE
> 
> Where freq is the number of times the ID is repeated in the dataframe. A
> TRUE appears in the cell if all elements in the column are the same for
> the ID and FALSE otherwise. It is insignificant which values differ for
> my problem.
> 
> The way I am thinking about tackling this is to loop through the ID
> variable and compare the values in the various columns of the dataframe.
> The problem I am encountering is that I don't think all.equal or
> identical are the right functions in this case.
> 
> So, say I was wanting to compare the elements of var1 for id ==1. I
> would have
> 
> x <- c(10,10)
> 
> Of course, the following works
> 
> > all.equal(x[1], x[2])
> [1] TRUE
> 
> As would a similar call to identical. However, what if I only have a
> vector of values (or if the column consists of names) that I want to
> assess for equality when I am trying to automate a process over
> thousands of cases? As in the example above, the vector may contain only
> two values or it may contain many more. The number of values in the
> vector differ by id.
> 
> Any thoughts?
> 
> Harold
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list