[R] how to extract strings in any column and in any row that start with

cpoiw@rt m@iii@g oii chemo@org@uk cpoiw@rt m@iii@g oii chemo@org@uk
Fri May 15 22:43:49 CEST 2020


This is almost certainly not the most efficient way:

tot <- data.frame(v1 = paste0(LETTERS[seq(1:5)],seq(1:10)),
              v2 = paste0(LETTERS[seq(1:5)],seq(from = 101, to=110, by = 
1)),
              v3 = paste0(LETTERS[seq(1:5)],seq(from = 111, to=120, by = 
1)),
              v4 = paste0(LETTERS[seq(1:5)],seq(from = 121, to=130, by = 
1)),
              v5 = paste0(LETTERS[seq(1:5)],seq(from = 131, to=140, by = 
1)),
              v6 = paste0(LETTERS[seq(1:5)],seq(from = 101, to=110, by = 
1))
              )

# set a variable to hold the result
myResult <- NULL

# iterate through each variable
for (v in 1:length(tot[1,])) {
   thisResult <- as.character(tot[grepl ('^E10', tot[,v]),v])
   myResult <- c(myResult, thisResult)
}

myResult <- unique( myResult )


===

Indeed as I wrote this Jeff has popped along with unlist!

Using my example above:

unique ( as.character( unlist (tot) )[grepl ('^E10', as.character( 
unlist (tot) ) )] )

does what you wanted (you may not need the as.characters if you are on R 
4.o, or if your df has chars rather than factors.

On 2020-05-15 21:34, Jeff Newmiller wrote:
> If you want to treat your data frame as if it were a vector, then
> convert it to a vector before you give it to grep.
> 
> unlist(tot)
> 
> On May 15, 2020 12:24:17 PM PDT, Ana Marija 
> <sokovic.anamarija using gmail.com> wrote:
>> Hello,
>> 
>> this command was running for more than 2 hours
>> grep("E10",tot,value=T)
>> and no output
>> 
>> and this command
>> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>> 
>> gave me a subset (a data frame) of tot where ^E10
>> 
>> what I need is just a vector or all values in tot which start with 
>> E10.
>> 
>> Thanks
>> Ana
>> 
>> On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
>> <jdnewmil using dcn.davis.ca.us> wrote:
>>> 
>>> Read about regular expressions... they are extremely useful.
>>> 
>>> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>>> 
>>> It is bad form not to put spaces around the <- assignment.
>>> 
>>> 
>>> On May 15, 2020 10:00:04 AM PDT, Ana Marija
>> <sokovic.anamarija using gmail.com> wrote:
>>> >Hello,
>>> >
>>> >I have a data frame:
>>> >
>>> >> dim(tot)
>>> >[1] 502536   1093
>>> >
>>> >How would I extract from it all strings that start with E10?
>>> >
>>> >I know how to extract all rows that contain with E10
>>> >df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
>>> >> dim(df0)
>>> >[1] 5105 1093
>>> >
>>> >but I just need a vector of strings that start with E10...
>>> >it would look something like this:
>>> >
>>> >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
>>> >
>>> >Thanks
>>> >Ana
>>> >
>>> >______________________________________________
>>> >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> >https://stat.ethz.ch/mailman/listinfo/r-help
>>> >PLEASE do read the posting guide
>>> >http://www.R-project.org/posting-guide.html
>>> >and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> --
>>> Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list