[R] odd behaviour of identical

Sun Nov 2 09:11:00 CET 2008

Berwin A Turlach wrote:
> On Sat, 01 Nov 2008 22:57:38 +0100
> Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>
>
>   
>> is.integer(1) # FALSE
>> is.integer(1:1) # TRUE
>>
>> is not particularly appealing as a design, though it can be defended
>> along the line that : uses 1 as the increase step, thus if it starts
>> at an integer, the vector contains integers (just like it says in
>> help(":")).  the problem here is that 1:1 is *obviously* an integer
>> vector, while 1 is *obviously* a number vector, but *obviously* not an
>> integer vector.  do a poll on how obvious it is to r's users, i'd bet
>> you lose.
>>     
>
> Probably you are right, but the set of useRs would be the wrong
> reference base.  Most useRs are using R for the purpose it was
> designed; namely, statistical analyses.  And I cannot remember any
> statistical analyses that I ever done where it mattered whether a
> number was stored as an integer or not; or whether is.integer()
> returned FALSE or TRUE.
>   

possibly.  i haven't had the problem myself, i learned about it here
because a confused user reported it, and it stroke me as irrational. 
the case with identical, where, e.g., identical(1:1, { x<-1:1; x[2]<-2;
x[1] }) is FALSE stroke me as even more irrational. 

quite possibly statisticians do not care about the internal
representations, but then they should be protected from incidentally
getting confused, complain, and receive a response of the type 'you need
to cast 1 to the integer type to have  it an integer'.

but i do know a number of statisticians who gave r a try, and never came
back.  i talked to them and know why, and my comments here are related
to those issues.  only that i do care discussing them.

> I can see a use of is.integer() only when you use R for programming.
> Personally, the only use I can see for is.integer() is for checking
> that a user has not by accident passed a vector stored in integer mode
> to a routine in which I pass this vector down to C or FORTRAN code that
> expects double or DOUBLE PRECISION, respectively.  But in such
> situation, rather than testing with is.integer(), I typically just use
> storage.mode() to ensure the proper storage mode.
>   

and storage.mode, while exposes the underlying representation, is at
least named so as not to cause confusion.  if is.integer(1) == FALSE
were good design (which it is not, i think) then having the function
named 'is.integer' is bad design, precisely because a user (a
statistician), as you say, does not care about integers as
representations, but integers as numbers.  again, this problem was
reported by a user who, presumably, uses r for statistics, and not me,
who uses r to find examples of bad design.

> In summary, if one uses R as a programming language then, as with
> any other programming language, one should become familiar with the
> language and its idiosyncrasies; and perhaps also with some features of
> binary computing and the constraints imposed by these feature.  In my
> experience, every language has idiosyncrasies that one has to get used
> to (and which one may or may not consider design flaws).  As the saying
> goes, a good handyman does not blame her/his tools.
>
>   

sure, but a good handyman uses the best tools affordable.  r certainly
gains a lot because it is open source, but this alone does not make it a
good tool. 

vQ