[R] a function more appropriate than 'sapply'?

Sat Jan 26 21:20:43 CET 2013

HI,

?grep() found to be a bit faster.

system.time({
indx<-grep(".*\\_.*\\_",wells[,1])
wells2<-wells[-indx,]
wellsNew<-wells[indx,]})
# user  system elapsed 
 # 0.024   0.000   0.023 

 system.time({
 w.sub <- gsub("[^_]+","",wells[,1])
  u.2 <- which(w.sub=="__")
  u.1 <- which(w.sub=="_")
 w.u1<- wells[u.1,]
w.u2<- wells[u.2,]})
#   user  system elapsed 
 # 0.048   0.000   0.047 
 identical(wells2,w.u1)
#[1] TRUE
 identical(wellsNew,w.u2)
#[1] TRUE

A.K.

----- Original Message -----
From: Berend Hasselman <bhh at xs4all.nl>
To: emorway <emorway at usgs.gov>
Cc: r-help at r-project.org
Sent: Saturday, January 26, 2013 2:46 PM
Subject: Re: [R] a function more appropriate than 'sapply'?

On 26-01-2013, at 19:43, emorway <emorway at usgs.gov> wrote:

> I'm wondering if I need to use a function other than sapply as the following
> line of code runs indefinitely (or > 30 min so far) and uses up all 16Gb of
> memory on my machine for what seems like a very small dataset (data attached
> in a txt file  wells.txt
> <http://r.789695.n4.nabble.com/file/n4656723/wells.txt>  ).  The R code is:
> 
> wells<-read.table("c:/temp/wells.txt",col.names=c("name","plc_hldr"))
> wells2<-wells[sapply(wells[,1],function(x)length(strsplit(as.character(x),
> "_")[[1]])==2),]
> 
> The 2nd line of R code above gets bogged down and takes all my RAM with it:
> <http://r.789695.n4.nabble.com/file/n4656723/memory_loss.png> 
> 
> I'm simply trying to extract all of the lines of data that have a single "_"
> in the first column and place them into a dataset called "wells2".  If that
> were to work, I then want to extract the lines of data that have two "_" and
> put them into a separate dataset, say "wells3".  Is there a better way to do
> this than the one-liner above?

Read your file with

    wells<-read.table("wells.txt",col.names=c("name","plc_hldr"), stringsAsFactors=FALSE)

Remove all non underscores with

    w.sub <- gsub("[^_]+","",wells[,1])

then select elements of w.sub with 2 underscores and a single underscore with

    u.2 <- which(w.sub=="__")
    u.1 <- which(w.sub=="_")

and use u.1 and u.2 to select the appropriate rows of wells.

I tried to select rows containing 1 or 2 underscores with grep regular expressions but that appeared to be more difficult than I had expected.
The method above is quick.

Berend

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.