[R] Reformatting text inside a data frame

Mon Sep 7 22:20:51 CEST 2015

Hi John,
     Thanks for the reply; I'm pasting here the output from dput, with a
'df <-' added in front:

df <- structure(list(rowNum = c(1, 2, 3), first = structure(c(NA, 1L,
2L), .Label = c("AD=2;BA=8", "AD=9;BA=1"), class = "factor"),
    second = structure(c(2L, 1L, NA), .Label = c("AD=1;BA=2",
    "AD=13;BA=49"), class = "factor")), .Names = c("rowNum",
"first", "second"), row.names = c(NA, -3L), class = "data.frame")

To add more specifics, about what I would like; each value to be adjusted
has the following general format:

"AD=X;BA=Y"

I would like to extract the values of X and Y and format them as a string
as such:

"X_X-Y"

Here's how I would handle a specific instance using awk in a shell script:

echo  "AD=X;BA=Y" | awk '{split($1,a,"AD="); split(a[2],b,";");
split(b[2],c,"BA="); print b[1]"_"b[1]"-"c[2]}'
X_X-Y

I'd like this to apply for all the entries that aren't NA to the right of
column 1.

Hoping this adds clarity for any others who also didn't follow my example.

Thanks in advance for any tips-

Best,
Jonathan

On Mon, Sep 7, 2015 at 3:48 PM, John Kane <jrkrideau at inbox.com> wrote:

> I'm not making a lot of sense of the data, it looks like you want more
> recodes than you have mentioned  but in any case  you might want to look at
> the recode function in the car package.  It "should" do what you want
> thought there may be faster ways to do it.
>
> BTW, for supplying sample data have a look at ?dput . Using dput() means
> that we see exactly the same data as you do.
>
> Sorry not to be of more help
> John Kane
> Kingston ON Canada
>
>
> > -----Original Message-----
> > From: jonsleepy at gmail.com
> > Sent: Mon, 7 Sep 2015 15:27:05 -0400
> > To: r-help at r-project.org
> > Subject: [R] Reformatting text inside a data frame
> >
> > Hi all,
> >     I've read in a large data frame that has formatting similar to the
> > one
> > in the small example below:
> >
> > df <-
> >
> data.frame(c(1,2,3),c(NA,"AD=2;BA=8","AD=9;BA=1"),c("AD=13;BA=49","AD=1;BA=2",NA));
> > names(df) <- c("rowNum","first","second")
> >
> >> df
> >   rowNum     first      second
> > 1      1      <NA> AD=13;BA=49
> > 2      2 AD=2;BA=8   AD=1;BA=2
> > 3      3 AD=9;BA=1        <NA>
> >
> >
> > I'd like to reformat all of the non-NA entries in df from "first" and
> > "second" and so-on such that "AD=13;BA=49" will be replaced by the
> > following string: "13_13-49".
> >
> > So applied to df, the output would be the following:
> >
> >   rowNum     first      second
> > 1      1      <NA> 13_13-49
> > 2      2 2_2-8   1_1-2
> > 3      3 9_9-1        <NA>
> >
> >
> > I'm generally a big proponent of shell scripting with awk, but I'd prefer
> > an all-R solution if one exists (and also to learn how to do this more
> > generally).
> >
> > Could someone point out an appropriate paradigm or otherwise point me in
> > the right direction?
> >
> > Best,
> > Jonathan
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ____________________________________________________________
> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
> Check it out at http://www.inbox.com/earth
>
>
>

	[[alternative HTML version deleted]]