[R] Removing values containing a specific character

arun smartpink111 at yahoo.com
Sun Jan 27 20:16:40 CET 2013


Hi, 
I tried with bigger dataset.

set.seed(25)
names <- sample(c("bob", "joe", "craig at gmail.com", "emily", "jane at yahoo.com"),5e6,replace=TRUE)
set.seed(1651)
emails
 <- sample(c("bobj at cup.com", "joesmith at gmail.com", "craig at gmail.com",
 "emily2 at yahoo.com", "jane at yahoo.com"),5e6,replace=TRUE)

 df <- data.frame(names, emails) 
 dim(df)
#[1] 5000000       2
 df[]<-lapply(df,as.character)
 system.time(df[,1][grep("@",df$names)]<- "" )
#   user  system elapsed 
#  1.732   0.108   1.844 
 system.time(dfNew1<-df[grep("\\w+",df$names),])
#   user  system elapsed 
#  0.896   0.024   0.923 
 system.time(dfNew2<- df[df$names!="",])
#   user  system elapsed 
 # 0.460   0.028   0.490 
A.K.







________________________________
From: Yasha Podeswa <ypodeswa at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Cc: R help <r-help at r-project.org>; Uwe Ligges <ligges at statistik.tu-dortmund.de> 
Sent: Sunday, January 27, 2013 2:05 PM
Subject: Re: [R] Removing values containing a specific character


You two were 100% right, it was just a memory issue.  This was part of a bigger project where I had a number of data frames loaded, all with 1-5 million rows. Cleaned up my code to have less data frames loaded at once, and everything is working great.  Thanks for the help!
On Jan 27, 2013 9:46 AM, "arun" <smartpink111 at yahoo.com> wrote:

Hi Yasha,
>
> I guess you got Uwe's response.
>
> I created `df2` with the intention of getting the two results from the original dataset.
>For example, after you get the first result
>df[,1][grep("@",df$names)]<- ""
>#you can get the second result by:
>df[df$names!="",]
> # names             emails
>#1   bob       bobj at cup.com
>#2   joe joesmith at gmail.com
>#4 emily   emily2 at yahoo.com
>
>#or
>df[grep("\\w+",df$names),]
>#  names             emails
>#1   bob       bobj at cup.com
>#2   joe joesmith at gmail.com
>#4 emily   emily2 at yahoo.com
>
>But, I am  not sure how this will work over a 5.5 million rows.
>A.K.
>
>
>
>
>----- Original Message -----
>From: ypodeswa <ypodeswa at gmail.com>
>To: r-help at r-project.org
>Cc:
>Sent: Sunday, January 27, 2013 1:11 AM
>Subject: Re: [R] Removing values containing a specific character
>
>Actually, it worked perfectly for my sample data, but my actual data has
>5.5 million rows, and grep doesn't seem to work with over a million rows.
>Any idea on a workaround?
>
>
>On Sat, Jan 26, 2013 at 9:37 PM, Yasha Podeswa <ypodeswa at gmail.com> wrote:
>
>> Awesome, thanks Arun, that's exactly what I was looking for!
>>
>>
>> On Sat, Jan 26, 2013 at 9:21 PM, arun kirshna [via R] <
>> ml-node+s789695n4656749h63 at n4.nabble.com> wrote:
>>
>>> Hi,
>>> Try this:
>>> df[]<-lapply(df,as.character)
>>> df2<-df
>>> df[,1][grep("@",df$names)]<- ""
>>> df
>>>   #names             emails
>>> #1   bob      bobj at cup.com
>>> #2   joe joesmith at gmail.com
>>> #3          craig at gmail.com
>>> #4 emily  emily2 at yahoo.com
>>> #5          jane at yahoo.com
>>>
>>> #2nd part:
>>>
>>>  df2[-grep("@",df2$names),]
>>>   names             emails
>>> #1   bob      bobj at cup.com
>>> #2   joe joesmith at gmail.com
>>> #4 emily  emily2 at yahoo.com
>>> A.K.
>>>
>>> ------------------------------
>>>  If you reply to this email, your message will be added to the
>>> discussion below:
>>>
>>> http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656749.html
>>>  To unsubscribe from Removing values containing a specific character, click
>>> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4656744&code=eXBvZGVzd2FAZ21haWwuY29tfDQ2NTY3NDR8LTEyMTY0MzM4NDk=>
>>> .
>>> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>
>>
>
>
>
>
>--
>View this message in context: http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656751.html
>Sent from the R help mailing list archive at Nabble.com.
>    [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list