[R] the difference between "-" and "!" between base and data.table package

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Sun Apr 16 09:51:28 CEST 2017

! is a logical operator... it means "not". When you write

lidx <- seq_along( mtcars[[ 1 ]] ) %in% train_indices

you end up with a vector of logical values for which ! makes sense. Since R supports logical indexing this can be a very convenient way to select one group or the other. 

If you give an integer to the ! operator, any non-zero value is treated as TRUE, which can be useful sometimes but not in this case, since all of the train_indices are greater than zero. Look at what !train_indices actually is.

As the Introduction to R document says, integer indexing always starts at 1 instead of zero as in many other languages. This makes it feasible to let negative integers as indexes represent the idea of excluding those positions. Thus

identical( mtcars[ !lidx, ], mtcars[ -train_indices, ] )

The ItoR document is really quite informative to re-read occasionally. For example, look up indexing with a matrix as the index. 
Sent from my phone. Please excuse my brevity.

On April 15, 2017 5:18:43 PM PDT, Carl Sutton via R-help <r-help at r-project.org> wrote:
>I normally use package data.table but today was doing some base R
>coding.  Had a problem for a bit which I finally resolved.  I was
>attempting to separate a data frame between train and test sets, and in
>base R was using the "!" to exclude training set indices from the data
>frame.  All I was getting was zero observations.  Changed to using "-"
>and it worked.  I recalled that in data.table the "!" function worked,
>so created this little bit of code.
>#  Base R Functions
>train_indices <- sample(nrow(mtcars), round(0.75*nrow(mtcars)))
>train <- mtcars[train_indices,]
>mode(train_indices); class(train_indices)
>test <- mtcars[!train_indices,]  #  the "!" function returning 0
>test_1 <- mtcars[-train_indices,]
>identical(test, test_1)
>#  Using data.table package
>dt1 <- data.table(mtcars)
>train_indices <- sample(nrow(dt1), round(0.75*nrow(dt1)))
>train <- dt1[train_indices,]
>mode(train_indices); class(train_indices)
>test <- dt1[!train_indices,]  #  the "!" function
>test_1 <- dt1[-train_indices,]
>identical(test, test_1)
>The documentation appears to me to accept "!" in base, so do I have
>some kind of ridiculous error or ..??
>Carl Sutton
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>PLEASE do read the posting guide
>and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list