[R] correct way to subset a vector

Marc Schwartz marc_schwartz at me.com
Thu Jul 9 18:17:48 CEST 2009


On Jul 9, 2009, at 10:40 AM, Juliet Hannah wrote:

> Hi,
>
> #make example data
> dat <- data.frame(matrix(rnorm(15),ncol=5))
> colnames(dat) <- c("ab","cd","ef","gh","ij")
>
> If I want to get a subset of the data for the middle 3 columns, and I
> know the names of the start column and the end column, I can do this:
>
> mysub <- subset(dat,select=c(cd:gh))
>
> If I wanted to do this just on the column names, without subsetting
> the data, how could I do this?
>
> mynames <- colnames(dat);
>
> #mynames
> #[1] "ab" "cd" "ef" "gh" "ij"
>
> Is there an easy way to create the vector c("cd","ef","gh") as I did
> above using something similar to cd:gh?
>
> Thanks,
>
> Juliet



Using the same presumption that the desired values are consecutive in  
the vector:

# Use which() to get the indices for the start and end of the subset
 > mynames[which(mynames == "cd"):which(mynames == "gh")]
[1] "cd" "ef" "gh"


You can encapsulate that in a function:

subset.vector <- function(x, start, end)
{
   x[which(x == start):which(x == end)]
}

 > subset.vector(mynames, "cd", "gh")
[1] "cd" "ef" "gh"



Note that you can also do this:

 > names(subset(dat, select = cd:gh))
[1] "cd" "ef" "gh"

but that actually goes through the process of subsetting the data  
frame first, which potentially introduces a lot of overhead and memory  
use if the data frame is large. It also presumes that the desired  
vector is a subset of the column names of the initial data frame.


To use the same sequence based approach as is used in  
subset.data.frame(), you can do what is used internally within that  
function:

subset.vector <- function(x, select)
{
   nl <- as.list(1L:length(x))
   names(nl) <- x
   vars <- eval(substitute(select), nl)
   x[vars]
}


 > subset.vector(mynames, select = cd:gh)
[1] "cd" "ef" "gh"



BTW, well done on recognizing that you can use the sequence of column  
names for the 'select' argument. A lot of folks, even experienced  
useRs, don't realize that you can do that...  :-)

HTH,

Marc Schwartz




More information about the R-help mailing list