[R] Strange result when subsetting a data frame based on a character variable

Bert Gunter bgunter.4567 at gmail.com
Tue Nov 17 20:37:21 CET 2015


> 2 == "2"
[1] TRUE

?"=="  says:

"If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw."

> as.character(99999)
[1] "99999"
> as.character(100000)
[1] "1e+05"
> as.character(100000) == "100000"
[1] FALSE


Cheers,
Bert




Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Nov 17, 2015 at 11:14 AM, Karl Schilling
<karl.schilling at uni-bonn.de> wrote:
> Dear all,
>
> I have one observation that I do not quite understand. Maybe someone
> can clarify this issue for me.
>
> I have a data frame which I want to subset based on a grouping variable, say
> "group". Actually, "group" is a numeric value, but it is saved as a
> character. I give some code to generate an exemplary data frame below.
>
> Now, if I use
>
> MySubset <- subset(Data, Data$group == "..")
>
> everything works fine, as expected. ".." stands here for the value of group
> given as a character string.
>
> Surprisingly, I also get a correct subsetting if I simply give the plain
> numeric value of group (like MySubset <- subset(Data, Data$group == ..), AS
> LONG AS this numeric value is less then 100000.
>
> If the numeric value is 100000 or larger, I get an empty subset.
>
> OK, I know how to avoid this situation, but I wonder what the explanation
> for this for me rather strange behavior might be.
>
> Thank you so much for your suggestions.
>
>
> Karl Schilling
>
>
> #####
> Exemplary code for reproducing the above described problem:
>
> options(stringsAsFactors = F)
>
> # set up some data frame
> value <- c(1:6)
> group <- rep(c("20000", "99999", "100000"), each = 2)
> Data <- data.frame(value = value, group = group)
> str(Data)
>
> # subset data frame based on the value of the variable "group",
> # treating this value once as a character, and once as a number:
>
> Data20 <- subset(Data, Data$group =="20000")
> str(Data20)
> Data20N <- subset(Data, Data$group ==20000)
> str(Data20N)
>
>
> Data99 <- subset(Data, Data$group =="99999")
> str(Data99)
> Data99N <- subset(Data, Data$group ==99999)
> str(Data99N)
> Data100 <- subset(Data, Data$group =="100000")
> str(Data100)
> Data100N <- subset(Data, Data$group ==100000)
> str(Data100N)
>
> --
> Karl Schilling
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list