[R] split a character variable into several character variable by a character

Adrian Dusa dusa.adrian at gmail.com
Fri Apr 10 17:05:05 CEST 2009


Good observation, Bill!
Adrian

On Friday 10 April 2009, William Dunlap wrote:
> strsplit() is the way to do it, but if your putative
> character strings come from a data.frame you need to make
> sure they are really character strings and not factors
> (at least in R 2.8.1).
>
>    > d<-data.frame(name=c("Bill Dunlap", "First Last"), num=1:2)
>    > d
>
>             name num
>    1 Bill Dunlap   1
>    2  First Last   2
>
>    > sapply(d,class)
>
>         name       num
>    "factor" "integer"
>
>    > strsplit(d$name, " ")
>
>    Error in strsplit(d$name, " ") : non-character argument
>
>    > strsplit(as.character(d$name), " ")
>
>    [[1]]
>    [1] "Bill"   "Dunlap"
>
>    [[2]]
>    [1] "First" "Last"
>
>    > d1<-data.frame(stringsAsFactors=FALSE,name=c("Bill Dunlap", "First
>
> Last"), num=1:2)
>
>    > sapply(d1,class)
>
>           name         num
>    "character"   "integer"
>
>    > strsplit(d1$name, " ")
>
>    [[1]]
>    [1] "Bill"   "Dunlap"
>
>    [[2]]
>    [1] "First" "Last"
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
>
> ------------------------------------------------------------------------
> -
> [R] split a character variable into several character variable	by a
> character
>
> Adrian Dusa dusa.adrian at gmail.com
> Fri Apr 10 15:48:53 CEST 2009
>
> Dear Mao Jianfeng,
>
> "r-help-owner" is not the place for help, but:
> r-help at r-project.org
> (CC-ed here)
>
> In any case, strsplit() does the job, i.e.:
> > unlist(strsplit("BCPy01-01", "-"))
>
> [1] "BCPy01" "01"
>
> You can work with the whole variable, like:
> splitpop <- strsplit(df1$popcode, "-")
>
> then access the first part with
>
> > unlist(lapply(splitpop, "[", 1))
>
>  [1] "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01"
> "BCPy01"
>  [9] "BCPy01" "BCPy01"
>
> and the second part with
>
> > unlist(lapply(splitpop, "[", 2))
>
>  [1] "01" "01" "01" "02" "02" "02" "02" "02" "02" "03"
>
> hth,
> Adrian
>
> On Friday 10 April 2009, Mao Jianfeng wrote:
> > Dear, R-lister,
> >
> > I have a dataframe like the followed. And, I want to split a character
> > variable ("popcode", or "codetot") into several new variables. For
>
> example,
>
> > split "BCPy01-01" (popcode[1]) into "BCPy01" and "01". I need to know
>
> how
>
> > to do that. I have tried strsplit() and substring() functions. But, I
>
> still
>
> > can not perform the spliting.
>
> It always helps to see exactly what you tried
> and a description of how the results differ from
> what you wanted to get.
>
> > Any advice? Thanks in advance.
> >
> > df1:
> > popcode     codetot   p3need
> > BCPy01-01 BCPy01-01-1 100.0000
> > BCPy01-01 BCPy01-01-2 100.0000
> > BCPy01-01 BCPy01-01-3 100.0000
> > BCPy01-02 BCPy01-02-1  92.5926
> > BCPy01-02 BCPy01-02-1 100.0000
> > BCPy01-02 BCPy01-02-2  92.5926
> > BCPy01-02 BCPy01-02-2 100.0000
> > BCPy01-02 BCPy01-02-3  92.5926
> > BCPy01-02 BCPy01-02-3 100.0000
> > BCPy01-03 BCPy01-03-1 100.0000
> >
> > Regards,
> >
> > Mao Jian-feng


-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
     +40 21 3120210 / int.101
Fax: +40 21 3158391




More information about the R-help mailing list