[R] recode according to specific sequence of characters within a string variable

David Winsemius dwinsemius at comcast.net
Fri Feb 4 16:41:21 CET 2011


On Feb 4, 2011, at 8:26 AM, Denis Kazakiewicz wrote:

> Dear R people
> Could you please help
> I have similar but opposite question
> How to reshape data from DF.new  to  DF from example, Mark kindly
> provided?

Well, I don't think you want a random order, right? If what you are  
asking is for a single character element per line of dataframe then  
try this:

apply(df.new, 1, paste, collapse="_")

-- 
David.
>
> Thank you
> Denis
>
> On Пят, 2011-02-04 at 07:09 -0600, Marc Schwartz wrote:
>> On Feb 4, 2011, at 6:32 AM, D. Alain wrote:
>>
>>> Dear R-List,
>>>
>>> I have a dataframe with one column "name.of.report" containing  
>>> character values, e.g.
>>>
>>>
>>>> df$name.of.report
>>>
>>> "jeff_2001_teamx"
>>> "teamy_jeff_2002"
>>> "robert_2002_teamz"
>>> "mary_2002_teamz"
>>> "2003_mary_teamy"
>>> ...
>>> (i.e. the bit of interest is not always at same position)
>>>
>>> Now I want to recode the column "name.of.report" into the  
>>> variables "person", "year","team", like this
>>>
>>>> new.df
>>>
>>> "person"  "year"  "team"
>>> jeff           2001      x
>>> jeff           2002      y
>>> robert       2002      z
>>> mary        2002      z
>>>
>>> I tried with grep()
>>>
>>> df$person<-grep("jeff",df$name.of.report)
>>>
>>> but of course it didn't exactly result in what I wanted to do.  
>>> Could not find any solution via RSeek. Excuse me if it is a very  
>>> silly question, but can anyone help me find a way out of this?
>>>
>>> Thanks a lot
>>>
>>> Alain
>>
>>
>> There will be several approaches, all largely involving the use of ? 
>> regex. Here is one:
>>
>>
>> DF <- data.frame(name.of.report = c("jeff_2001_teamx",  
>> "teamy_jeff_2002",
>>                                    "robert_2002_teamz",  
>> "mary_2002_teamz",
>>                                    "2003_mary_teamy"))
>>
>>> DF
>>     name.of.report
>> 1   jeff_2001_teamx
>> 2   teamy_jeff_2002
>> 3 robert_2002_teamz
>> 4   mary_2002_teamz
>> 5   2003_mary_teamy
>>
>>
>> DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF 
>> $name.of.report),
>>                     year = gsub(".*([0-9]{4}).*","\\1", DF 
>> $name.of.report),
>>                     team = gsub(".*team(.).*","\\1", DF 
>> $name.of.report))
>>
>>
>>> DF.new
>>  person year team
>> 1   jeff 2001    x
>> 2   jeff 2002    y
>> 3 robert 2002    z
>> 4   mary 2002    z
>> 5   mary 2003    y
>>
>>
>>
>> HTH,
>>
>> Marc Schwartz
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list