[R] applying strsplit to a whole column

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Wed Aug 4 21:40:13 CEST 2010


Thanks a lot, David.
It works perfectly. Of course, lapply is also a loop!

So, your method is:
z<-data.frame(nam1=c("bbb..aba","ccc..abb","ddd..abc","eee..abd"),stringsAsFactors=FALSE)
z$nam2<-unlist(lapply( strsplit(z[[1]],split="\\.."), "[", 1))
z$nam3<-unlist(lapply( strsplit(z[[1]],split="\\.."), "[", 2))

And using the new package "stringr" (thank you for sharing!):
y<-data.frame(nam1=c("aaa..aba","bbb..abb","ccc..abc","ddd..abd"),
stringsAsFactors=FALSE)
library(stringr)
y$nam2<-as.data.frame(str_split_fixed(y$nam1, "\\..", 2))[[1]]
y$nam3<-as.data.frame(str_split_fixed(y$nam1, "\\..", 2))[[2]]
(y)

One question - what exactly does the square bracket in your lapply
code mean? Looks like a shortcut - I've not seen it before.
lapply( strsplit(z[[1]],split="\\.."), "[", 1)

Thank you!
Dimitri

On Wed, Aug 4, 2010 at 3:31 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Aug 4, 2010, at 3:03 PM, Dimitri Liakhovitski wrote:
>
>> I am sorry, someone said that strsplit automatically works on a
>> column. How exactly does it work?
>> For example, if I want to grab just the first (or the second) part of
>> the string in nam1 that should be split based on ".."
>> x<-data.frame(nam1=c("bbb..aba","ccc..abb","ddd..abc","eee..abd"),
>> stringsAsFactors=FALSE)
>> str(x)
>> strsplit(x[[1]],split="\\..")
>> str(strsplit(x[[1]],split="\\.."))
>>
>> I am getting a list - hence, it looks like I have to go in a loop...?
>>
>> lapply( strsplit(x[[1]],split="\\.."), "[", 1)
> [[1]]
> [1] "bbb"
>
> [[2]]
> [1] "ccc"
>
> [[3]]
> [1] "ddd"
>
> [[4]]
> [1] "eee"
>
>> lapply( strsplit(x[[1]],split="\\.."), "[", 2)
> [[1]]
> [1] "aba"
>
> [[2]]
> [1] "abb"
>
> [[3]]
> [1] "abc"
>
> [[4]]
> [1] "abd"
>
>> unlist(lapply( strsplit(x[[1]],split="\\.."), "[", 2) )
> [1] "aba" "abb" "abc" "abd"
>> unlist(lapply( strsplit(x[[1]],split="\\.."), "[", 1) )
> [1] "bbb" "ccc" "ddd" "eee"
>>
>
>
>> Thank you!
>> Dimitri
>>
>>
>> On Wed, Aug 4, 2010 at 2:39 PM, Dimitri Liakhovitski
>> <dimitri.liakhovitski at gmail.com> wrote:
>>>
>>> Thank you very much, everyone!
>>> Dimitri
>>>
>>> On Wed, Aug 4, 2010 at 2:10 PM, David Winsemius <dwinsemius at comcast.net>
>>> wrote:
>>>>
>>>> On Aug 4, 2010, at 1:42 PM, Dimitri Liakhovitski wrote:
>>>>
>>>>> I am sorry, I'd like to split my column ("names") such that all the
>>>>> beginning of a string ("X..") is gone and only the rest of the text is
>>>>> left.
>>>>
>>>> I could not tell whether it was the string "X.." or the pattern "X.."
>>>> that
>>>> was your goal for matching and removal.
>>>>>
>>>>> x<-data.frame(names=c("X..aba","X..abb","X..abc","X..abd"))
>>>>> x$names<-as.character(x$names)
>>>>
>>>> a) Instead of "names" which is heavily used function name, use something
>>>> more specific. Otherwise you get:
>>>>>
>>>>> names(x)
>>>>
>>>> "names"  # and thereby avoid list comments about canines.
>>>>
>>>> b) Instead of coercing a character vector back to a character vector,
>>>> use
>>>> stringsAsFactors = FALSE.
>>>>
>>>>> x<-data.frame(nam1=c("X..aba","X..abb","X..abc","X..abd"),
>>>>> stringsAsFactors=FALSE)
>>>>
>>>> #Thus is the pattern version:
>>>>
>>>>> x$nam1 <- gsub("X..",'', x$nam1)
>>>>> x
>>>>
>>>>  nam1
>>>> 1   aba
>>>> 2   abb
>>>> 3   abc
>>>> 4   abd
>>>>
>>>> This is the string version:
>>>>>
>>>>> x<-data.frame(nam1=c("X......aba","X.y.abb","X..abc","X..abd"),
>>>>> stringsAsFactors=FALSE)
>>>>>  x$nam1 <- gsub("X\\.+",'', x$nam1)
>>>>> x
>>>>
>>>>  nam1
>>>> 1   aba
>>>> 2 y.abb
>>>> 3   abc
>>>> 4   abd
>>>>
>>>>
>>>>> (x)
>>>>> str(x)
>>>>>
>>>>> Can't figure out how to apply strsplit in this situation - without
>>>>> using a loop. I hope it's possible to do it without a loop - is it?
>>>>
>>>> --
>>>>
>>>> David Winsemius, MD
>>>> West Hartford, CT
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Dimitri Liakhovitski
>>> Ninah Consulting
>>> www.ninah.com
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> Ninah Consulting
>> www.ninah.com
>
> David Winsemius, MD
> West Hartford, CT
>
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com



More information about the R-help mailing list