[Rd] factor() on a double vector

Hervé Pagès hpages at fhcrc.org
Wed Feb 23 21:17:25 CET 2011


On 02/23/2011 12:09 PM, Simon Urbanek wrote:
> Herve,
>
> the answer is simple - it's as.character() - it has nothing to do with factor or table.
>
>> as.character(x)
> [1] "3.66666666666667" "3.66666666666667" "3.66666666666666" "3.66666666666667"
>
> That's what you are passing to factor, so you get the corresponding results.

I see. Thanks Simon.

I missed this:

   levels: an optional vector of the values that ‘x’ might have taken.
           The default is the unique set of values taken by
           ‘as.character(x)’, ...

Cheers,
H.

>
> Cheers,
> Simon
>
>
>
> On Feb 23, 2011, at 2:55 PM, Hervé Pagès wrote:
>
>> Hi,
>>
>> When 'x' is a vector of doubles, it's not clear how 'factor(x)'
>> compares its values in order to determine the levels. For example,
>> here all the values in 'x' are "conceptually" the same:
>>
>>   x<- c(11/3,
>>          2/3 + 4/3 + 5/3,
>>          50 + 11/3 - 50,
>>          7.00001 - 1000003/300000)
>>
>> However, due to machine rounding errors, they are not strictly equal:
>>
>>   >  duplicated(x)
>>   [1] FALSE FALSE FALSE FALSE
>>   >  unique(x)
>>   [1] 3.666667 3.666667 3.666667 3.666667
>>
>> but they are nearly equal:
>>
>>   >  all.equal(x, rep(11/3, 4))
>>   [1] TRUE
>>
>> Now factor(), and therefore table() (which seems to be using factor()
>> internally), have a different opinion:
>>
>>   >  factor(x)
>>   [1] 3.66666666666667 3.66666666666667 3.66666666666666 3.66666666666667
>>   Levels: 3.66666666666666 3.66666666666667
>>
>>   >  table(x)
>>   x
>>   3.66666666666666 3.66666666666667
>>                  1                3
>>
>> So factor() doesn't seem to be using "strict equality" or "near
>> equality" to determine the levels. What does it use? Sorry if I
>> missed it but I couldn't find any information about this in its
>> man page.
>>
>> Wouldn't it be better if factor() was consistent with either
>> duplicated() or all.equal() instead of introducing its own way
>> of comparing doubles that lies somewhere in between?
>>
>> Cheers,
>> H.
>>
>>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>> [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
>> [7] LC_PAPER=en_US.utf8       LC_NAME=C
>> [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.12.0
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list