[R] recode according to specific sequence of characters within a string variable

Denis Kazakiewicz d.kazakiewicz at gmail.com
Fri Feb 4 14:26:17 CET 2011


Dear R people
Could you please help
I have similar but opposite question
How to reshape data from DF.new  to  DF from example, Mark kindly
provided?

Thank you
Denis

On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote:
> On Feb 4, 2011, at 6:32 AM, D. Alain wrote:
> 
> > Dear R-List, 
> > 
> > I have a dataframe with one column "name.of.report" containing character values, e.g.
> > 
> > 
> >> df$name.of.report
> > 
> > "jeff_2001_teamx"
> > "teamy_jeff_2002"
> > "robert_2002_teamz"
> > "mary_2002_teamz"
> > "2003_mary_teamy"
> > ...
> > (i.e. the bit of interest is not always at same position)
> > 
> > Now I want to recode the column "name.of.report" into the variables "person", "year","team", like this
> > 
> >> new.df
> > 
> > "person"  "year"  "team"
> > jeff           2001      x
> > jeff           2002      y
> > robert       2002      z
> > mary        2002      z
> > 
> > I tried with grep()
> > 
> > df$person<-grep("jeff",df$name.of.report)
> > 
> > but of course it didn't exactly result in what I wanted to do. Could not find any solution via RSeek. Excuse me if it is a very silly question, but can anyone help me find a way out of this?
> > 
> > Thanks a lot
> > 
> > Alain
> 
> 
> There will be several approaches, all largely involving the use of ?regex. Here is one:
> 
> 
> DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002", 
>                                     "robert_2002_teamz", "mary_2002_teamz", 
>                                     "2003_mary_teamy"))
> 
> > DF
>      name.of.report
> 1   jeff_2001_teamx
> 2   teamy_jeff_2002
> 3 robert_2002_teamz
> 4   mary_2002_teamz
> 5   2003_mary_teamy
> 
> 
> DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF$name.of.report),
>                      year = gsub(".*([0-9]{4}).*","\\1", DF$name.of.report),
>                      team = gsub(".*team(.).*","\\1", DF$name.of.report))
> 
> 
> > DF.new
>   person year team
> 1   jeff 2001    x
> 2   jeff 2002    y
> 3 robert 2002    z
> 4   mary 2002    z
> 5   mary 2003    y
> 
> 
> 
> HTH,
> 
> Marc Schwartz
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list