[R] [EXTERNAL] Re: unexpected behavior in apply

Derickson, Ryan, VHA NCOD Ry@n@Der|ck@on @end|ng |rom v@@gov
Fri Oct 8 20:24:17 CEST 2021


This is interesting and does seem suboptimal. Especially because if I start with a matrix from the beginning, it behaves as expected.

> d<-data.frame(d1 = letters[1:3],
+               d2 = c("1","2","3"),
+               d3 = c(NA,NA,"6"))
> 
> str(d)
'data.frame':	3 obs. of  3 variables:
 $ d1: chr  "a" "b" "c"
 $ d2: chr  "1" "2" "3"
 $ d3: chr  NA NA "6"
> 
> apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
   d1    d2    d3 
FALSE  TRUE FALSE




-----Original Message-----
From: Jiefei Wang <szwjf08 using gmail.com> 
Sent: Friday, October 8, 2021 2:22 PM
To: Derickson, Ryan, VHA NCOD <Ryan.Derickson using va.gov>
Cc: r-help using r-project.org
Subject: [EXTERNAL] Re: [R] unexpected behavior in apply

Ok, it turns out that this is documented, even though it looks surprising.

First of all, the apply function will try to convert any object with the dim attribute to a matrix(my intuition agrees with you that there should be no conversion), so the first step of the apply function is

> as.matrix.data.frame(d)
     d1  d2  d3
[1,] "a" "1" NA
[2,] "b" "2" NA
[3,] "c" "3" " 6"

Since the data frame `d` is a mixture of character and non-character values, the non-character value will be converted to the character using the function `format`. However, the problem is that the NA value will also be formatted to the character

> format(c(NA, 6))
[1] "NA" " 6"

That's where the space comes from. It is purely for making the result pretty... The character NA will be removed later, but the space is not stripped. I would say this is not a good design, and it might be worth not including the NA value in the format function. At the current stage, I will suggest using the function `lapply` to do what you want.

> lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3))
$d1
[1] FALSE
$d2
[1] TRUE
$d3
[1] FALSE

Everything should work as you expect.

Best,
Jiefei

On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwjf08 using gmail.com> wrote:
>
> Hi,
>
> I guess this can tell you what happens behind the scene
>
>
> > d<-data.frame(d1 = letters[1:3],
> +               d2 = c(1,2,3),
> +               d3 = c(NA,NA,6))
> > apply(d, 2, FUN=function(x)x)
>      d1  d2  d3
> [1,] "a" "1" NA
> [2,] "b" "2" NA
> [3,] "c" "3" " 6"
> > "a"<=3
> [1] FALSE
> > "2"<=3
> [1] TRUE
> > "6"<=3
> [1] FALSE
>
> Note that there is an additional space in the character value " 6", 
> that's why your comparison fails. I do not understand why but this 
> might be a bug in R
>
> Best,
> Jiefei
>
> On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help 
> <r-help using r-project.org> wrote:
> >
> > Hello,
> >
> > I'm seeing unexpected behavior when using apply() compared to a for loop when a character vector is part of the data subjected to the apply statement. Below, I check whether all non-missing values are <= 3. If I include a character column, apply incorrectly returns TRUE for d3. If I only pass the numeric columns to apply, it is correct for d3. If I use a for loop, it is correct.
> >
> > > d<-data.frame(d1 = letters[1:3],
> > +               d2 = c(1,2,3),
> > +               d3 = c(NA,NA,6))
> > >
> > > d
> >   d1 d2 d3
> > 1  a  1 NA
> > 2  b  2 NA
> > 3  c  3  6
> > >
> > > # results are incorrect
> > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >    d1    d2    d3
> > FALSE  TRUE  TRUE
> > >
> > > # results are correct
> > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >    d2    d3
> >  TRUE FALSE
> > >
> > > # results are correct
> > > for(i in names(d)){
> > +   print(all(d[!is.na(d[,i]),i] <= 3)) }
> > [1] FALSE
> > [1] TRUE
> > [1] FALSE
> >
> >
> > Finally, if I remove the NA values from d3 and include the character column in apply, it is correct.
> >
> > > d<-data.frame(d1 = letters[1:3],
> > +               d2 = c(1,2,3),
> > +               d3 = c(4,5,6))
> > >
> > > d
> >   d1 d2 d3
> > 1  a  1  4
> > 2  b  2  5
> > 3  c  3  6
> > >
> > > # results are correct
> > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >    d1    d2    d3
> > FALSE  TRUE FALSE
> >
> >
> > Can someone help me understand what's happening?
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fst
> > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7C%7Cd4c50
> > d8f8da547cbf36108d98a88880c%7Ce95f1b23abaf45ee821db7ab251ab3bf%7C0%7
> > C0%7C637693141284202940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3KAp
> > Y5pdxAh5BzVZvjyrQKTpqkigQmW8N7pmU7DQGcU%3D&reserved=0
> > PLEASE do read the posting guide 
> > https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > .r-project.org%2Fposting-guide.html&data=04%7C01%7C%7Cd4c50d8f8d
> > a547cbf36108d98a88880c%7Ce95f1b23abaf45ee821db7ab251ab3bf%7C0%7C0%7C
> > 637693141284202940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI
> > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=mgrquTpZU
> > SQt7cGywiHtaKWrdqAjvaG4gFx9aD7nRlA%3D&reserved=0
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list