[R] Reformatting text inside a data frame

David Winsemius dwinsemius at comcast.net
Mon Sep 7 23:25:59 CEST 2015


> On Sep 7, 2015, at 1:20 PM, Jon BR <jonsleepy at gmail.com> wrote:
> 
> Hi John,
>     Thanks for the reply; I'm pasting here the output from dput, with a
> 'df <-' added in front:
> 
> df <- structure(list(rowNum = c(1, 2, 3), first = structure(c(NA, 1L,
> 2L), .Label = c("AD=2;BA=8", "AD=9;BA=1"), class = "factor"),
>    second = structure(c(2L, 1L, NA), .Label = c("AD=1;BA=2",
>    "AD=13;BA=49"), class = "factor")), .Names = c("rowNum",
> "first", "second"), row.names = c(NA, -3L), class = "data.frame")
> 
> 
> 
> 
> To add more specifics, about what I would like; each value to be adjusted
> has the following general format:
> 
> "AD=X;BA=Y"
> 
> I would like to extract the values of X and Y and format them as a string
> as such:
> 
> "X_X-Y"
> 
> 
> Here's how I would handle a specific instance using awk in a shell script:
> 
> echo  "AD=X;BA=Y" | awk '{split($1,a,"AD="); split(a[2],b,";");
> split(b[2],c,"BA="); print b[1]"_"b[1]"-"c[2]}'
> X_X-Y
> 
> I'd like this to apply for all the entries that aren't NA to the right of
> column 1.

df[2:3] <- lapply(df[2:3], sub, patt="(AD\\=)(.+)(;BA\\=)(.+)”,
                                repl="\\2_\\2-\\4” )

> df
  rowNum first   second
1      1  <NA> 13_13-49
2      2 2_2-8    1_1-2
3      3 9_9-1     <NA>

> 
> Hoping this adds clarity for any others who also didn't follow my example.
> 
> Thanks in advance for any tips-
> 
> Best,
> Jonathan
> 
> On Mon, Sep 7, 2015 at 3:48 PM, John Kane <jrkrideau at inbox.com> wrote:
> 
>> I'm not making a lot of sense of the data, it looks like you want more
>> recodes than you have mentioned  but in any case  you might want to look at
>> the recode function in the car package.  It "should" do what you want
>> thought there may be faster ways to do it.
>> 
>> BTW, for supplying sample data have a look at ?dput . Using dput() means
>> that we see exactly the same data as you do.
>> 
>> Sorry not to be of more help
>> John Kane
>> Kingston ON Canada
>> 
>> 
>>> -----Original Message-----
>>> From: jonsleepy at gmail.com
>>> Sent: Mon, 7 Sep 2015 15:27:05 -0400
>>> To: r-help at r-project.org
>>> Subject: [R] Reformatting text inside a data frame
>>> 
>>> Hi all,
>>>    I've read in a large data frame that has formatting similar to the
>>> one
>>> in the small example below:
>>> 
>>> df <-
>>> 
>> data.frame(c(1,2,3),c(NA,"AD=2;BA=8","AD=9;BA=1"),c("AD=13;BA=49","AD=1;BA=2",NA));
>>> names(df) <- c("rowNum","first","second")
>>> 
>>>> df
>>>  rowNum     first      second
>>> 1      1      <NA> AD=13;BA=49
>>> 2      2 AD=2;BA=8   AD=1;BA=2
>>> 3      3 AD=9;BA=1        <NA>
>>> 
>>> 
>>> I'd like to reformat all of the non-NA entries in df from "first" and
>>> "second" and so-on such that "AD=13;BA=49" will be replaced by the
>>> following string: "13_13-49".
>>> 
>>> So applied to df, the output would be the following:
>>> 
>>>  rowNum     first      second
>>> 1      1      <NA> 13_13-49
>>> 2      2 2_2-8   1_1-2
>>> 3      3 9_9-1        <NA>
>>> 
>>> 
>>> I'm generally a big proponent of shell scripting with awk, but I'd prefer
>>> an all-R solution if one exists (and also to learn how to do this more
>>> generally).
>>> 
>>> Could someone point out an appropriate paradigm or otherwise point me in
>>> the right direction?
>>> 
>>> Best,
>>> Jonathan
>>> 
>>>      [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> ____________________________________________________________
>> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
>> Check it out at http://www.inbox.com/earth
>> 
>> 
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list