[R] Odd behavior of a function within apply

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Tue Aug 9 18:55:22 CEST 2022


Yes, David, the function described seems to insist it be of type integer or type character and if the type was double or others might well fail as y would never be initialized.

The goal seems to be to count how many "missing" values are found as in NA if a numeric type or an empty string if character.

But you can have some form of NA in all kinds of object types including character as in this construct:

> x <- c("a", NA, "", "b", "NA)")
> x
[1] "a"   NA    ""    "b"   "NA)"

The above has three useless elements if both NA and "" are considered empty. So logically the condition could be to count NA and IF it is of type character, also count "". 

So rather than play games testing not just is.integer, is.double (or just is.numeric) as well as is.logical and is.raw, all the above can be tested with is.na() first to add up how many Na they contain. If then it is of type character, you can add any blank strings. 

So the algorithm would initialize y to sum(is.na(vec)) and then if the vec is character, add the sum of how many empty strings.

Alternately, the function should deal with what it wants to do if any other type is encountered. You can internally converts many things to integer or character and then operate on them. Or you can return a zero or raise an alarm when given something else.

In this case, simply setting y to zero before using it would make it defined and avoid the error, albeit report nothing found if it was a double or Boolean vector even if it did contain NA.


-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of David Carlson via R-help
Sent: Tuesday, August 9, 2022 11:33 AM
To: Erin Hodgess <erinm.hodgess using gmail.com>
Cc: r-help using r-project.org
Subject: Re: [R] Odd behavior of a function within apply

Could you have columns that are not character or integer so that y is never defined in the function?

count1a(1:5/3)
Error in count1a(1:5/3) : object 'y' not found

David Carlson


On Mon, Aug 8, 2022 at 1:35 PM Erin Hodgess <erinm.hodgess using gmail.com> wrote:

> OK.⁠​ I'm back again.⁠​ So my test1.⁠​df is 236x390 If I put in the 
> following:⁠​ lapply(test1.⁠​df,count1a) Error in FUN(X[[i]], 
> .⁠​.⁠​.⁠​) :⁠​ object 'y' not found > lapply(test1.⁠​df,count1a) Error 
> in FUN(X[[i]],
> .⁠​.⁠​.⁠​) :⁠​ object 'y' not found > sapply(test1.⁠​df,count1a) 
> ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This 
> message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
>
> OK.  I'm back again.
>
> So my test1.df is 236x390
>
> If I put in the following:
>  lapply(test1.df,count1a)
> Error in FUN(X[[i]], ...) : object 'y' not found
> > lapply(test1.df,count1a)
> Error in FUN(X[[i]], ...) : object 'y' not found
> > sapply(test1.df,count1a)
> Error in FUN(X[[i]], ...) : object 'y' not found
> >
> What am I doing wrong, please?
> Thanks,
> Erin
>
>
> Erin Hodgess, PhD
> mailto: erinm.hodgess using gmail.com
>
>
> On Mon, Aug 8, 2022 at 1:41 PM Erin Hodgess <erinm.hodgess using gmail.com> wrote:
>
> > Awesome, thanks so much!!
> >
> > Erin Hodgess, PhD
> > mailto: erinm.hodgess using gmail.com
> >
> >
> > On Mon, Aug 8, 2022 at 1:38 PM John Fox <jfox using mcmaster.ca> wrote:
> >
> >> Dear Erin,
> >>
> >> The problem is that the data frame gets coerced to a character 
> >> matrix, and the only column with "" entries is the 9th (the second 
> >> one you
> >> supplied):
> >>
> >> as.matrix(test1.df)
> >>     X1_1_HZP1 X1_1_HBM1_mon X1_1_HBM1_yr
> >> 1  "48160"   "December"    "2014"
> >> 2  "48198"   "June"        "2018"
> >> 3  "80027"   "August"      "2016"
> >> 4  "48161"   ""            NA
> >> 5  NA        ""            NA
> >> 6  "48911"   "August"      "1985"
> >> 7  NA        "April"       "2019"
> >> 8  "48197"   "February"    "1993"
> >> 9  "48021"   ""            NA
> >> 10 "11355"   "December"    "1990"
> >>
> >> (Here, test1.df only contains the three columns you provided.)
> >>
> >> A solution is to use sapply:
> >>
> >>  > sapply(test1.df, count1a)
> >>      X1_1_HZP1 X1_1_HBM1_mon  X1_1_HBM1_yr
> >>              2             3             3
> >>
> >>
> >> I hope this helps,
> >>   John
> >>
> >>
> >> On 2022-08-08 1:22 p.m., Erin Hodgess wrote:
> >> > Hello!
> >> >
> >> > I have the following data.frame
> >> >   dput(test1.df[1:10,8:10])
> >> > structure(list(X1_1_HZP1 = c(48160L, 48198L, 80027L, 48161L, NA, 
> >> > 48911L, NA, 48197L, 48021L, 11355L), X1_1_HBM1_mon = 
> >> > c("December", "June", "August", "", "", "August", "April", 
> >> > "February", "", "December"), X1_1_HBM1_yr = c(2014L, 2018L, 
> >> > 2016L, NA, NA, 1985L, 2019L, 1993L, NA, 1990L)), row.names = 
> >> > c(NA, 10L), class = "data.frame")
> >> >
> >> > And the following function:
> >> >> dput(count1a)
> >> > function (x)
> >> > {
> >> >      if (typeof(x) == "integer")
> >> >          y <- sum(is.na(x))
> >> >      if (typeof(x) == "character")
> >> >          y <- sum(x == "")
> >> >      return(y)
> >> > }
> >> > When I use the apply function with count1a, I get the following:
> >> >   apply(test1.df[1:10,8:10],2,count1a)
> >> >      X1_1_HZP1 X1_1_HBM1_mon  X1_1_HBM1_yr
> >> >             NA             3            NA
> >> > However, when I do use columns 8 and 10, I get the correct response:
> >> >   apply(test1.df[1:10,c(8,10)],2,count1a)
> >> >     X1_1_HZP1 X1_1_HBM1_yr
> >> >             2            3
> >> >>
> >> > I am really baffled.  If I use count1a on a single column, it 
> >> > works
> >> fine.
> >> >
> >> > Any suggestions much appreciated.
> >> > Thanks,
> >> > Sincerely,
> >> > Erin
> >> >
> >> >
> >> > Erin Hodgess, PhD
> >> > mailto: erinm.hodgess using gmail.com
> >> >
> >> >       [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> >> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo
> >> > /r-help__;!!KwNVnqRv!CHx9JKnbOObpAt0LltEogLSxDUEl9qJDI6FgqMJBG_kd
> >> > RHAy8SJJdx6Uq0p4rpBa4E3DkmQ65UImH48MBvSbrfE$
> >> > PLEASE do read the posting guide
> >> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.
> >> html__;!!KwNVnqRv!CHx9JKnbOObpAt0LltEogLSxDUEl9qJDI6FgqMJBG_kdRHAy8
> >> SJJdx6Uq0p4rpBa4E3DkmQ65UImH48MdYOqruE$
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> --
> >> John Fox, Professor Emeritus
> >> McMaster University
> >> Hamilton, Ontario, Canada
> >> web: 
> >> https://urldefense.com/v3/__https://socialsciences.mcmaster.ca/jfox
> >> /__;!!KwNVnqRv!CHx9JKnbOObpAt0LltEogLSxDUEl9qJDI6FgqMJBG_kdRHAy8SJJ
> >> dx6Uq0p4rpBa4E3DkmQ65UImH48MRU4wu3o$
> >>
> >>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________R-help using r-project.org 
> mailing list -- To UNSUBSCRIBE and more, 
> seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r
> -help__;!!KwNVnqRv!CHx9JKnbOObpAt0LltEogLSxDUEl9qJDI6FgqMJBG_kdRHAy8SJ
> Jdx6Uq0p4rpBa4E3DkmQ65UImH48MBvSbrfE$
> PLEASE do read the posting guide 
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.htm
> l__;!!KwNVnqRv!CHx9JKnbOObpAt0LltEogLSxDUEl9qJDI6FgqMJBG_kdRHAy8SJJdx6
> Uq0p4rpBa4E3DkmQ65UImH48MdYOqruE$ and provide commented, minimal, 
> self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list