[R] Testing if all elements are equal in a vector/matrix

Mon Jun 22 13:15:52 CEST 2009

Hi

jim holtman <jholtman at gmail.com> napsal dne 19.06.2009 15:06:55:

> I have wondered about this way of testing for equality:
>  
> > x <- c(1,0,3,0)
> > x[1] * length(x) == sum(x)
> [1] TRUE
> > x <- rep(1,4)
> > x[1] * length(x) == sum(x)
> [1] TRUE
> This would seem to indicate that both vectors contain the same values, 
but not
> necessarily true.

My solution has some flaws however if user is be reasonably sure that such 
condition is improbable it could be used.

The problem was stated as this

> > I just want to check whether all the elements of a vector are same. My
> > vector has one million elements and it is highly likely that there are
> > distinct elements in the first few itself. For example:
> >
> > > x = c(1,2,rep(1,100000))

and for that kind of vectors it could be quite safe. Although I would 
prefer something like this.

> fff2<-function(x) length(unique(x))==1
> system.time(print(fff2(x)))
[1] FALSE
   user  system elapsed 
   0.39    0.08    0.47 

Regards
Petr

> On Fri, Jun 19, 2009 at 8:18 AM, Petr PIKAL <petr.pikal at precheza.cz> 
wrote:
> Hi
> 
> utkarshsinghal <utkarsh.singhal at global-analytics.com> napsal dne
> 17.06.2009 15:29:34:
> 
> > I will wait for the next version-2.9.1 and presently using Petr's
> suggestion, i.e.,
> > (x[1]*length(x))==sum(x)
> > which significantly reduced the run time.
> >
> > The problem is now there might be only small differences ,say, of the
> order of
> > 10^-10 which I want to ignore.
> >
> > So I used:
> > isTRUE(all.equal((x[1]*length(x)),sum(x)))
> > as suggested in the documentation of all.equal.
> >
> > But this again increased the run time to five times.
> >
> > 1) Is there any faster way of doing the same?
> 
> Maybe (not tested)
> 
> (x[1]*length(x))==round(sum(x),10)
> 
> Petr
> 
> > 2) Will the function "anyDuplicated" treat almost equal values as
> duplicated
> > or not? Actually I need both the options.
> >
> >
> > Regards
> > Utkarsh
> >
> >
> >
> > Prof Brian Ripley wrote:
> > On Tue, 16 Jun 2009, Prof Brian Ripley wrote:
> 
> > On Tue, 16 Jun 2009, jim holtman wrote:
> 
> > I think the only way that you are going to get it to stop on the first
> > mismatch is to write your own function in C if you are concerned about
> the
> > time.  Matching on character vectors will be even more costly since it
> is
> > having to loop to check the equality of each character in each 
element.
> > This is one of the places it might pay to convert to factors and then
> the
> > comparison only uses the integer values assigned to the factors.
> >
> > Not so in a recent R: comparison of character vectors is now done by
> comparing
> > pointers in the first instance so (at least on a 32-bit platform) is 
as
> fast
> > as comparing integers.  And on x86_64 Linux:
> 
> > x <- as.character(c(1,2,rep(1,10000000)))
> > system.time(print(all(x[1] == x)))
> > [1] FALSE
> >   user  system elapsed
> >  0.123   0.019   0.142
> 
> > system.time(xx <- as.factor(x))
> >   user  system elapsed
> >  9.874   0.284  10.159
> > system.time(print(all(xx[1] == xx)))
> > [1] FALSE
> >   user  system elapsed
> >  0.511   0.145   0.656
> >
> > Recent pre-release versions of R (e.g. 2.9.1 beta) allow
> 
> > system.time(anyDuplicated(x))
> >   user  system elapsed
> >  0.034   0.078   0.113
> > system.time(anyDuplicated(xx))
> >   user  system elapsed
> >  0.037   0.076   0.113
> >
> > I'm sorry, a line got reverted here: I had edited this to say
> >
> > 'which is a C-level speedup of the sort the original poster seemed to 
be
> looking for'
> 
> >
> >
> > On Tue, Jun 16, 2009 at 8:31 AM, utkarshsinghal <
> > utkarsh.singhal at global-analytics.com> wrote:
> 
> > Hi Jim,
> >
> > What you are saying is correct. Although, my computer might not have
> same
> > speed and I am getting the following for 10M entries:
> >
> >    user  system elapsed
> >   0.559   0.038   0.607
> >
> > Moreover, in the case of character vectors, it gets more than double.
> >
> > In my modeling, which is already highly time consuming,  I need to do
> check
> > this for few thousand vectors and the entries can easily be 10M in 
each
> > vector. So I am just looking for any possibilities of time saving.  I 
am
> 
> > pretty sure that whenever elements are not all equal, it can be
> concluded
> > from any few entries (most of the times). It will be worth if I can 
find
> a
> > way which stops checking further the moment it find two distinct
> elements.
> >
> > Regards
> > Utkarsh
> >
> >
> >
> > jim holtman wrote:
> >
> > Just check that the first (or any other element) is equal to all the
> rest:
> 
> > x = c(1,2,rep(1,10000000)) # 10,000,000
> > system.time(print(all(x[1] == x)))
> > [1] FALSE
> >    user  system elapsed
> >    0.18    0.00    0.19
> 
> >
> > This was for 10M entries.
> >
> > On Tue, Jun 16, 2009 at 7:42 AM, utkarshsinghal <
> > utkarsh.singhal at global-analytics.com> wrote:
> 
> >
> > Hi All,
> >
> > There are several replies to the question below, but I think there 
must
> > exist a  better way of doing so.
> > I just want to check whether all the elements of a vector are same. My
> > vector has one million elements and it is highly likely that there are
> > distinct elements in the first few itself. For example:
> >
> > > x = c(1,2,rep(1,100000))
> >
> > I want the answer as FALSE, which is clear from the first two
> > observations itself and we don't need to check for the rest.
> >
> > Does anybody know the most efficient way of doing this?
> >
> > Regards
> > Utkarsh
> >
> >
> >
> > From: Francisco J. Zagmutt <gerifalte28_at_hotmail.com
> > <mailto:gerifalte28_at_hotmail.com
> >
> 
?Subject=Re:%20%5BR%5D%20Testing%20if%20all%20elements%20are%20equal%20in%20a%
> > 20vector/matrix>>
> >
> > Date: Tue 30 Aug 2005 - 06:05:20 EST
> >
> >
> > Hi Doran
> >
> > The documentation for isTRUE reads 'isTRUE(x)' is an abbreviation of
> > 'identical(TRUE,x)' so actually Vincent's solutions is "cleaner" than
> > using identical :)
> >
> > Cheers
> >
> > Francisco
> >
> > />From: "Doran, Harold" <HDoran at air.org> /
> > />To: <vincent.goulet at act.ulaval.ca>, <r-help at stat.math.ethz.ch> /
> > />Subject: Re: [R] Testing if all elements are equal in a 
vector/matrix
> /
> > />Date: Mon, 29 Aug 2005 15:49:20 -0400 /
> > /> /
> > >See ?identical
> > <http://tolstoy.newcastle.edu.au/R/help/05/08/11201.html#11202qlink1>
> > /> /
> > />-----Original Message----- /
> > />From: r-help-bounces at stat.math.ethz.ch /
> > />[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Vincent 
Goulet
> /
> > />Sent: Monday, August 29, 2005 3:35 PM /
> > />To: r-help at stat.math.ethz.ch /
> > />Subject: [R] Testing if all elements are equal in a vector/matrix /
> > /> /
> > /> /
> > />Is there a canonical way to check if all elements of a vector or
> > matrix are /
> > />the same? Solutions below work, but look hackish to me. /
> > /> /
> > /> > x <- rep(1, 10) /
> > /> > all(x == x[1]) # == operator does not provide for small 
differences
> /
> > */>[1] TRUE /
> > */> > isTRUE(all.equal(x, rep(x[1], length(x)))) # ugly /
> > */>[1] TRUE /
> > */> /
> > />Best, /
> > /> /
> > />Vincent /
> > />-- /
> > /> Vincent Goulet, Associate Professor /
> > /> ?cole d'actuariat /
> > /> Universit? Laval, Qu?bec /
> > /> Vincent.Goulet_at_act.ulaval.ca<
> http://vincent.goulet_at_act.ulaval.ca/>
> > <mailto:Vincent.Goulet_at_act.ulaval.ca
> >
> 
?Subject=Re:%20%5BR%5D%20Testing%20if%20all%20elements%20are%20equal%20in%20a%
> > 20vector/matrix>
> > http://vgoulet.act.ulaval.ca /
> > /> /
> > />______________________________________________ /
> > />R-help at stat.math.ethz.ch mailing list /
> > />https://stat.ethz.ch/mailman/listinfo/r-help /
> > />PLEASE do read the posting guide! /
> > />http://www.R-project.org/posting-guide.html<
http://www.r-project.org/
> > posting-guide.html>/
> > /> /
> > />______________________________________________ /
> > />R-help at stat.math.ethz.ch mailing list /
> > />https://stat.ethz.ch/mailman/listinfo/r-help /
> > />PLEASE do read the posting guide! /
> > />http://www.R-project.org/posting-guide.html<
http://www.r-project.org/
> > posting-guide.html>/
> >
> >        [[alternative HTML version deleted]]
> >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html<
> http://www.r-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> >
> > What is the problem that you are trying to solve?
> >
> >
> 
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> >
> > What is the problem that you are trying to solve?
> >
> >     [[alternative HTML version deleted]]
> >
> 
> >
> > --
> > Brian D. Ripley,                  ripley at stats.ox.ac.uk
> > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford,             Tel:  +44 1865 272861 (self)
> > 1 South Parks Road,                     +44 1865 272866 (PA)
> > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?