[R] Error in if (fraction <= 1) { : missing value where TRUE/FALSE needed

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Thu Jan 27 23:18:57 CET 2022


Timothy,
In reply to what you wrote about a benchmark suggesting some storage formats may make the code run slower, it is not a surprise, given what you chose to benchmark.

You are using a test of a logical variable in a numeric context when you have code like:
    log(a2+0.01)
In order to do the calculation, depending on the internals, you need to convert a2 to at least an integer or perhaps a floating point value such as 1L or 1.0 before adding 0.01 to it. 

You are doing the equivalent of:
    log(as.integer(a2)+0.01)

or perhaps:
        log(as.double(a2)+0.01)

The result is some extra work in THAT context. Note I am NOT saying R calls one of those primitive functions, just that the final code does such conversions perhaps at the assembler level or lower.
But consider the opposite context such as in a if(...) statement as in:
    if(a2) {do_this) else {do_that}

If a2 is already a logical data form, it happens rapidly. If a2 is something more complex that can be evaluated in steps into a logical, it takes those steps. As I showed earlier, if a2 was 1 or 666 it would be evaluated to be non-zero and thus converted to TRUE and then the statement would choose do_this, else it would evaluate to FALSE and do do_that.
So the right storage format depends on how you want to use it and how much storage space you are willing to use. On some machines and architectures, they may store a logical value in anything from a bit to a byte to multiple bytes, and on a lower level, it may be expanded as needed to fit into a fixed register on the CPU. In some cases, a natural storage format will be the one that can be used immediately with no boxing or unboxing. But as always, there are tradeoffs to be considered in terms of how many cycles are used (execution time) or other resources like memory in use. In a few cases, it may oddly pay to make two or more copies of a vector in different storage formats and then use the best one for subsequent calculations. Boolean might turn out to be a great choice for indexing into a list or vector or matrix or data.frame, while integer may be great if doing mathematics like multiplication into a structure that only contains integers, and a double version when interacting with such numbers and maybe even versions that are character or complex.
But from a novice perspective, performance is not usually a big concern and especially not for small amounts of data. The only reason this is being discussed is that the question about what went wrong might be hard to figure out without lots more info, while the simple wrok-around might either work fine or tell us more about what might be wrong.

-----Original Message-----
From: Ebert,Timothy Aaron <tebert using ufl.edu>
To: Bert Gunter <bgunter.4567 using gmail.com>
Cc: R-help <r-help using r-project.org>
Sent: Thu, Jan 27, 2022 2:27 pm
Subject: Re: [R] Error in if (fraction <= 1) { : missing value where TRUE/FALSE needed

You did not claim it is faster, but if one is writing large programs or has huge quantities of data then thinking about execution time could be useful.

if(!require(microbenchmark)){install.packages("microbenchmark")}
library(microbenchmark)
a1<-c(1,1,0,1,0,0,0,1,1,0,0,0,0,1,1,1,0,1,1,1,1,0,0,0,1,0,1,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,1,0,0,1)
a2=as.logical(a1)

microbenchmark(
  {
    log(a1+0.01)
  },
  {
    log(a2+0.01)
  },
  times=100000
)

On my system running the code shows that there is an overhead cost if the logical has to be converted. In this simple code it was a sometimes significant but always a trivial 0.1 microsecond cost. I tried a few other bits of code and the mean and minimum values were always smaller performing numeric operations on a numeric variable. However, it looks like the range in values for a numeric operation on a numeric variable is greater. I don't understand why.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4567 using gmail.com> 
Sent: Thursday, January 27, 2022 1:17 PM
To: Ebert,Timothy Aaron <tebert using ufl.edu>
Cc: PIKAL Petr <petr.pikal using precheza.cz>; R-help <r-help using r-project.org>
Subject: Re: [R] Error in if (fraction <= 1) { : missing value where TRUE/FALSE needed

[External Email]

I did not claim it is faster -- and in fact I doubt that it makes any real difference. Just simpler, imo. I also think that the logical vector would serve equally in any situation in most cases where arithmetic 0/1 coding is used -- even arithmetic ops and comparisons:
> TRUE + 2
[1] 3
> TRUE > .5
[1] TRUE
(?'+' has details)
I would appreciate someone responding with a nontrivial counterexample to this claim if they have one, other than the sort of thing shown in the ?logical example involving conversion to character:

## logical interpretation of particular strings
charvec <- c("FALSE", "F", "False", "false",    "fAlse", "0",
            "TRUE",  "T", "True",  "true",    "tRue",  "1")
as.logical(charvec)

## factors are converted via their levels, so string conversion is used
as.logical(factor(charvec))
as.logical(factor(c(0,1)))  # "0" and "1" give NA

(I mean of course purely internal R code, not export of data to an external application).


Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Jan 27, 2022 at 8:12 AM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
>
> One could use the microbenchmark package to compare which approach is faster, assuming the dataset is large enough that the outcome will make a measurable difference.
>


	[[alternative HTML version deleted]]



More information about the R-help mailing list