[R] dataframe: string operations on columns

Ivan Calandra ivan.calandra at uni-hamburg.de
Wed Jan 19 09:57:24 CET 2011


Well, my solution with the loop might be slower (even though I don't see 
any difference with my system, at least with up to 100 lines and 3 
strings to separate), but it works whatever the number of strings.
But I should have renamed the columns outside of the loop:
names(df)[2:3] <- paste("a", 1:2, sep="")  ##or a more general solution 
for the indexes

Ivan


Le 1/19/2011 01:42, Niels Richard Hansen a écrit :
>> On 2011-01-18 08:14, Ivan Calandra wrote:
>>> Hi,
>>>
>>> I guess it's not the nicest way to do it, but it should work for you:
>>>
>>> #create some sample data
>>> df<- data.frame(a=c("A B", "C D", "A C", "A D", "B D"),
>>> stringsAsFactors=FALSE)
>>> #split the column by space
>>> df_split<- strsplit(df$a, split=" ")
>>>
>>> #place the first element into column a1 and the second into a2
>>> for (i in 1:length(df_split[[1]])){
>>>    df[i+1]<- unlist(lapply(df_split, FUN=function(x) x[i]))
>>>    names(df)[i+1]<- paste("a",i,sep="")
>>> }
>>>
>>> I hope people will give you more compact solutions.
>>> HTH,
>>> Ivan
>>>
>> You can replace the loop with
>>
>>  df <- transform(df, a1 = sapply(df_split, "[[", 1),
>>                      a2 = sapply(df_split, "[[", 2))
>
> df <- cbind(df, do.call(rbind, df_split)
>
> seems to do the same (up to column names) but faster. However,
> all the solutions rely on there being exactly two strings when
> you split. The different solutions behave differently if this
> assumption is violated and none of them really checks this. You
> can, for instance, check this with all(sapply(df_split, length) == 2)
>
> Best, Niels R. Hansen
>
>>
>> Peter Ehlers
>>
>>>
>>>
>>> Le 1/18/2011 16:30, boris pezzatti a écrit :
>>>>
>>>> Dear all,
>>>> how can I perform a string operation like strsplit(x," ")  on a column
>>>> of a dataframe, and put the first or the second item of the split into
>>>> a new dataframe column?
>>>> (so that on each row it is consistent)
>>>>
>>>> Thanks
>>>> Boris
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php



More information about the R-help mailing list