[R] Testing if all elements are equal in a vector/matrix

Petr PIKAL petr.pikal at precheza.cz
Fri Jun 19 14:18:51 CEST 2009


Hi

utkarshsinghal <utkarsh.singhal at global-analytics.com> napsal dne 
17.06.2009 15:29:34:

> I will wait for the next version-2.9.1 and presently using Petr's 
suggestion, i.e.,
> (x[1]*length(x))==sum(x)
> which significantly reduced the run time.
> 
> The problem is now there might be only small differences ,say, of the 
order of
> 10^-10 which I want to ignore.
> 
> So I used:
> isTRUE(all.equal((x[1]*length(x)),sum(x)))
> as suggested in the documentation of all.equal.
> 
> But this again increased the run time to five times.
> 
> 1) Is there any faster way of doing the same?

Maybe (not tested)

(x[1]*length(x))==round(sum(x),10)

Petr

> 2) Will the function "anyDuplicated" treat almost equal values as 
duplicated 
> or not? Actually I need both the options.
> 
> 
> Regards
> Utkarsh
> 
> 
> 
> Prof Brian Ripley wrote: 
> On Tue, 16 Jun 2009, Prof Brian Ripley wrote: 

> On Tue, 16 Jun 2009, jim holtman wrote: 

> I think the only way that you are going to get it to stop on the first 
> mismatch is to write your own function in C if you are concerned about 
the 
> time.  Matching on character vectors will be even more costly since it 
is 
> having to loop to check the equality of each character in each element. 
> This is one of the places it might pay to convert to factors and then 
the 
> comparison only uses the integer values assigned to the factors. 
> 
> Not so in a recent R: comparison of character vectors is now done by 
comparing
> pointers in the first instance so (at least on a 32-bit platform) is as 
fast 
> as comparing integers.  And on x86_64 Linux: 

> x <- as.character(c(1,2,rep(1,10000000))) 
> system.time(print(all(x[1] == x))) 
> [1] FALSE 
>   user  system elapsed 
>  0.123   0.019   0.142 

> system.time(xx <- as.factor(x)) 
>   user  system elapsed 
>  9.874   0.284  10.159 
> system.time(print(all(xx[1] == xx))) 
> [1] FALSE 
>   user  system elapsed 
>  0.511   0.145   0.656 
> 
> Recent pre-release versions of R (e.g. 2.9.1 beta) allow 

> system.time(anyDuplicated(x)) 
>   user  system elapsed 
>  0.034   0.078   0.113 
> system.time(anyDuplicated(xx)) 
>   user  system elapsed 
>  0.037   0.076   0.113 
> 
> I'm sorry, a line got reverted here: I had edited this to say 
> 
> 'which is a C-level speedup of the sort the original poster seemed to be 
looking for' 

> 
> 
> On Tue, Jun 16, 2009 at 8:31 AM, utkarshsinghal < 
> utkarsh.singhal at global-analytics.com> wrote: 

> Hi Jim, 
> 
> What you are saying is correct. Although, my computer might not have 
same 
> speed and I am getting the following for 10M entries: 
> 
>    user  system elapsed 
>   0.559   0.038   0.607 
> 
> Moreover, in the case of character vectors, it gets more than double. 
> 
> In my modeling, which is already highly time consuming,  I need to do 
check 
> this for few thousand vectors and the entries can easily be 10M in each 
> vector. So I am just looking for any possibilities of time saving.  I am 

> pretty sure that whenever elements are not all equal, it can be 
concluded 
> from any few entries (most of the times). It will be worth if I can find 
a 
> way which stops checking further the moment it find two distinct 
elements. 
> 
> Regards 
> Utkarsh 
> 
> 
> 
> jim holtman wrote: 
> 
> Just check that the first (or any other element) is equal to all the 
rest: 

> x = c(1,2,rep(1,10000000)) # 10,000,000 
> system.time(print(all(x[1] == x))) 
> [1] FALSE 
>    user  system elapsed 
>    0.18    0.00    0.19 

> 
> This was for 10M entries. 
> 
> On Tue, Jun 16, 2009 at 7:42 AM, utkarshsinghal < 
> utkarsh.singhal at global-analytics.com> wrote: 

> 
> Hi All, 
> 
> There are several replies to the question below, but I think there must 
> exist a  better way of doing so. 
> I just want to check whether all the elements of a vector are same. My 
> vector has one million elements and it is highly likely that there are 
> distinct elements in the first few itself. For example: 
> 
> > x = c(1,2,rep(1,100000)) 
> 
> I want the answer as FALSE, which is clear from the first two 
> observations itself and we don't need to check for the rest. 
> 
> Does anybody know the most efficient way of doing this? 
> 
> Regards 
> Utkarsh 
> 
> 
> 
> From: Francisco J. Zagmutt <gerifalte28_at_hotmail.com 
> <mailto:gerifalte28_at_hotmail.com 
> 
?Subject=Re:%20%5BR%5D%20Testing%20if%20all%20elements%20are%20equal%20in%20a%
> 20vector/matrix>> 
> 
> Date: Tue 30 Aug 2005 - 06:05:20 EST 
> 
> 
> Hi Doran 
> 
> The documentation for isTRUE reads 'isTRUE(x)' is an abbreviation of 
> 'identical(TRUE,x)' so actually Vincent's solutions is "cleaner" than 
> using identical :) 
> 
> Cheers 
> 
> Francisco 
> 
> />From: "Doran, Harold" <HDoran at air.org> / 
> />To: <vincent.goulet at act.ulaval.ca>, <r-help at stat.math.ethz.ch> / 
> />Subject: Re: [R] Testing if all elements are equal in a vector/matrix 
/ 
> />Date: Mon, 29 Aug 2005 15:49:20 -0400 / 
> /> / 
> >See ?identical 
> <http://tolstoy.newcastle.edu.au/R/help/05/08/11201.html#11202qlink1> 
> /> / 
> />-----Original Message----- / 
> />From: r-help-bounces at stat.math.ethz.ch / 
> />[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Vincent Goulet 
/ 
> />Sent: Monday, August 29, 2005 3:35 PM / 
> />To: r-help at stat.math.ethz.ch / 
> />Subject: [R] Testing if all elements are equal in a vector/matrix / 
> /> / 
> /> / 
> />Is there a canonical way to check if all elements of a vector or 
> matrix are / 
> />the same? Solutions below work, but look hackish to me. / 
> /> / 
> /> > x <- rep(1, 10) / 
> /> > all(x == x[1]) # == operator does not provide for small differences 
/ 
> */>[1] TRUE / 
> */> > isTRUE(all.equal(x, rep(x[1], length(x)))) # ugly / 
> */>[1] TRUE / 
> */> / 
> />Best, / 
> /> / 
> />Vincent / 
> />-- / 
> /> Vincent Goulet, Associate Professor / 
> /> ?cole d'actuariat / 
> /> Universit? Laval, Qu?bec / 
> /> Vincent.Goulet_at_act.ulaval.ca<
http://vincent.goulet_at_act.ulaval.ca/> 
> <mailto:Vincent.Goulet_at_act.ulaval.ca 
> 
?Subject=Re:%20%5BR%5D%20Testing%20if%20all%20elements%20are%20equal%20in%20a%
> 20vector/matrix> 
> http://vgoulet.act.ulaval.ca / 
> /> / 
> />______________________________________________ / 
> />R-help at stat.math.ethz.ch mailing list / 
> />https://stat.ethz.ch/mailman/listinfo/r-help / 
> />PLEASE do read the posting guide! / 
> />http://www.R-project.org/posting-guide.html<http://www.r-project.org/
> posting-guide.html>/ 
> /> / 
> />______________________________________________ / 
> />R-help at stat.math.ethz.ch mailing list / 
> />https://stat.ethz.ch/mailman/listinfo/r-help / 
> />PLEASE do read the posting guide! / 
> />http://www.R-project.org/posting-guide.html<http://www.r-project.org/
> posting-guide.html>/ 
> 
>        [[alternative HTML version deleted]] 
> 
> 
> ______________________________________________ 
> R-help at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html<
http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code. 
> 

> 
> 
> -- 
> Jim Holtman 
> Cincinnati, OH 
> +1 513 646 9390 
> 
> What is the problem that you are trying to solve? 
> 
> 

> 
> 
> -- 
> Jim Holtman 
> Cincinnati, OH 
> +1 513 646 9390 
> 
> What is the problem that you are trying to solve? 
> 
>     [[alternative HTML version deleted]] 
> 

> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk 
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/ 
> University of Oxford,             Tel:  +44 1865 272861 (self) 
> 1 South Parks Road,                     +44 1865 272866 (PA) 
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595 
> 
> ______________________________________________ 
> R-help at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list