[R] how to extract strings in any column and in any row that start with

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Sat May 16 00:12:22 CEST 2020


Hello,

I have tried several options and with large dataframes this one was the 
fastest (in my tests, of the ones I have tried).


s1 <- sapply(tot, function(x) grep('^E10', x, value = TRUE))


Then unlist(s1).
A close second (15% slower) was


s2 <- tot[sapply(tot, function(x) grepl('^E10', x))]


grep/unlist was 3.7 times slower:


grep("^E10", unlist(tot), value = TRUE)


Hope this helps,

Rui Barradas

Às 20:24 de 15/05/20, Ana Marija escreveu:
> Hello,
> 
> this command was running for more than 2 hours
> grep("E10",tot,value=T)
> and no output
> 
> and this command
> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
> 
> gave me a subset (a data frame) of tot where ^E10
> 
> what I need is just a vector or all values in tot which start with E10.
> 
> Thanks
> Ana
> 
> On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
> <jdnewmil using dcn.davis.ca.us> wrote:
>>
>> Read about regular expressions... they are extremely useful.
>>
>> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>>
>> It is bad form not to put spaces around the <- assignment.
>>
>>
>> On May 15, 2020 10:00:04 AM PDT, Ana Marija <sokovic.anamarija using gmail.com> wrote:
>>> Hello,
>>>
>>> I have a data frame:
>>>
>>>> dim(tot)
>>> [1] 502536   1093
>>>
>>> How would I extract from it all strings that start with E10?
>>>
>>> I know how to extract all rows that contain with E10
>>> df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
>>>> dim(df0)
>>> [1] 5105 1093
>>>
>>> but I just need a vector of strings that start with E10...
>>> it would look something like this:
>>>
>>> [1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
>>>
>>> Thanks
>>> Ana
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list