[R] Selecting rows and columns of a data frame using relational operators

Erich Subscriptions erich.subs at neuwirth.priv.at
Mon Feb 27 13:30:07 CET 2017


The answer is simple

data[,4] == 1 produces a logical vector of length nrow(data)
and the subsetting mechanism for data frames in R needs a vector of the same length 
as the data frame has rows.

data[1:20,4] == 1
produces a data frame of length 20, and if this is not the length of data.
So R applies its standard procedure, it repeats this vector as often as needed to get
a vector of length == nrow(data)


Th following code illustrates what is happening

data <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100),a=rep(c(1,2,1,2),c(2,48,2,48)))

vec1 <- data[,4]==1
vec2 <- data[1:20,4]==1


> On 27 Feb 2017, at 13:07, Tunga Kantarcı <tungakantarci at gmail.com> wrote:
> 
> Consider a data frame named data. data contains 4 columns and 1000
> rows. Say the aim is to bring together columns 1, 2, and 4, if the
> values in column 4 is equal to 1. We could use the syntax
> 
> data(data[,4] == 1, c(1 2 4))
> 
> for this purpose. Suppose now that the aim is to bring together
> columns 1, 2, and 4, if the values in column 4 is equal to 1, for the
> first 20 rows of column 4. We could use the syntax
> 
> data(data[1:20,4] == 1, c(1 2 4))
> 
> for this purpose. However, this does not produce the desired result.
> This is surprising at least for someone coming from MATLAB because
> MATLAB produces what is desired.
> 
> Question 1: The code makes sense but why does it not produce what we
> expect it to produce?
> 
> Question 2: What code is instead suitable?
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list