[R] if else statement for rain data to define zero for dry and one to wet

Sat Jun 6 22:55:44 CEST 2015

I'm sorry, but I have to take issue with this particular use case of
ifelse(). When the goal is to generate a logical vector, ifelse() is
very inefficient. It's better to apply a logical condition directly to
the object in question and multiply the result by 1 to make it
numeric/integer rather than logical.

To illustrate this, consider the following toy example. The function
f1 replicates the suggestion to apply ifelse() columnwise (with the
additional overhead of preallocating storage for the result), whereas
the function f2 applies the logical condition on the matrix itself
using vectorization, with the recognition that a matrix is an atomic
vector with a dim attribute.

set.seed(5290)

# 1000 x 1000 matrix
m <- matrix(sample(c(0, 0.05, 0.2), 1e6, replace = TRUE), ncol = 1000)

f1 <- function(mat)
  {
     newmat <- matrix(NA, ncol = ncol(mat), nrow = nrow(mat))
     for(i in seq_len(ncol(mat)))
         newmat[, i] <- ifelse(mat[, i] > 0.1, 1, 0)
     newmat
  }

f2 <- function(mat) 1 * (mat > 0.1)

On my system, I got

> system.time(m1 <- f1(m))
   user  system elapsed
   0.14    0.00    0.14

> system.time(m2 <- f2(m))
   user  system elapsed
   0.01    0.00    0.01

> identical(m1, m2)
[1] TRUE

The all too common practice of using  ifelse(condition, 1, 0) on an
atomic vector is easily replaced by 1 * (condition), where the result
of condition is a logical atomic object coerced to numeric.

To reduce memory, one should better define f2 as

f2 <- function(mat) 1L * (mat > 0.1)

but doing so in this example no longer creates identical objects since

> typeof(m1)
[1] "double"

Thus, f1 is not only inefficient in terms of execution time, it's also
inefficient in terms of storage.

Given several recent warnings in this forum about the inefficiency of
ifelse() and the dozens of times I've seen the idiom implemented in f1
as a solution over the last several years (to which I have likely
contributed in my distant past as an R-helper), I felt compelled to
say something about this practice, which BTW extends not just to 0/1
return values but to
0/x return values, where x is a nonzero real number.

Dennis

On Sat, Jun 6, 2015 at 12:50 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi rosalinazairimah,
> I think the problem is that you are using "if" instead of "ifelse". Try this:
>
> wet_dry<-function(x,thresh=0.1) {
>  for(column in 1:dim(x)[2]) x[,column]<-ifelse(x[,column]>=thresh,1,0)
>  return(x)
> }
> wet_dry(dt)
>
> and see what you get.
>
> Also, why can I read your message perfectly while everybody else can't?
>
> Jim
>
>>> -----Original Message-----
>>> From: roslinaump at gmail.com
>>> Sent: Fri, 5 Jun 2015 16:49:08 +0800
>>> To: r-help at r-project.org
>>> Subject: [R] if else statement for rain data to define zero for dry and
>>> one to wet
>>>
>>> Dear r-users,
>>>
>>> I have a set of rain data:
>>>
>>> X1950 X1951 X1952 X1953 X1954 X1955 X1956 X1957 X1958 X1959 X1960 X1961
>>> X1962
>>>
>>> 1   0.0   0.0  14.3   0.0  13.5  13.2   4.0     0   3.3     0     0   0.0
>>>
>>>
>>> 2   0.0   0.0  21.9   0.0  10.9   6.6   2.1     0   0.0     0     0   0.0
>>>
>>>
>>> 3  25.3   6.7  18.6   0.8   2.3   0.0   8.0     0   0.0     0     0  11.0
>>>
>>>
>>> 4  12.7   3.4  37.2   0.9   8.4   0.0   5.8     0   0.0     0     0   5.5
>>>
>>>
>>> 5   0.0   0.0  58.3   3.6  21.1   4.2   3.0     0   0.0     0     0  15.9
>>>
>>>
>>> I would like to go through each column and define each cell with value
>>> greater than 0.1 mm will be 1 and else zero. Hence I would like to attach
>>> the rain data and the category side by side:
>>>
>>>
>>> 1950   state
>>>
>>> 1 0.0    0
>>>
>>> 2 0.0    0
>>>
>>> 3 25.3   1
>>>
>>> 4 12.7   1
>>>
>>> 5 0.0    0
>>>
>>>
>>> ...
>>>
>>>
>>> This is my code:
>>>
>>>
>>> wet_dry  <- function(dt)
>>>
>>> { cl   <- length(dt)
>>>
>>>   tresh  <- 0.1
>>>
>>>
>>>   for (i in 1:cl)
>>>
>>>   {  xi <- dt[,i]
>>>
>>>      if (xi < tresh ) 0 else 1
>>>
>>>   }
>>>
>>> dd <- cbind(dt,xi)
>>>
>>> dd
>>>
>>> }
>>>
>>>
>>> wet_dry(dt)
>>>
>>>
>>> Results:
>>>
>>>> wet_dry(dt)
>>>
>>>    X1950 X1951 X1952 X1953 X1954 X1955 X1956 X1957 X1958 X1959 X1960
>>> X1961
>>> X1962 X1963 X1964 X1965 X1966 X1967 X1968 X1969 X1970 X1971 X1972 X1973
>>> X1974 X1975 X1976 X1977
>>>
>>> 1    0.0   0.0  14.3   0.0  13.5  13.2   4.0   0.0   3.3   0.0   0.0
>>> 0.0
>>>   4.2   0.0   2.2   0.0   4.4   5.1     0   7.2   0.0   0.0   0.0   5.1
>>> 0   0.0     0   0.3
>>>
>>> 2    0.0   0.0  21.9   0.0  10.9   6.6   2.1   0.0   0.0   0.0   0.0
>>> 0.0
>>>   8.4   0.0   4.0   0.0   4.9   0.7     0   0.0   0.0   0.0   0.0   5.4
>>> 0   3.3     0   0.3
>>>
>>> 3   25.3   6.7  18.6   0.8   2.3   0.0   8.0   0.0   0.0   0.0   0.0
>>> 11.0
>>>   4.2   0.0   2.0   0.0  14.2  17.1     0   0.0   0.0   0.0   0.0   2.1
>>> 0   1.7     0   4.4
>>>
>>> 4   12.7   3.4  37.2   0.9   8.4   0.0   5.8   0.0   0.0   0.0   0.0
>>> 5.5
>>>   0.0   0.0   5.4   0.0   6.4  14.9     0  10.1   2.9 143.4   0.0   6.1
>>> 0   0.0     0  33.5
>>>
>>>
>>> It does not work and give me the original data.  Why is that?
>>>
>>>
>>> Thank you so much for your help.
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.