[R] How to create a new variable based on parts of another character variable.

Jim Lemon jim at bitwrit.com.au
Mon Oct 24 12:22:53 CEST 2011


On 10/24/2011 12:35 AM, Philipp Fischer wrote:
> Hello,
> I am just starting with R and I am having a (most probably) stupid problem by creating a new variable in a data.frame based on a part of another character variable.
>
> I have a data frame like this one:
>
>
> A			B 		C
> AWI-test1	1		i
> AWI-test5	2		r
> AWI-tes75	56		z
> UFT-2		5		I
> UFT56		f		t
> UFT356		9j		t
> etc. etc.		89		t
>
>
> I now want to look in the variable A if the string AWI is present and then create a variable D and putting "Arctic" inside. However, if the string UFT occurs in the variable A, then the variable D shall be "Boreal" etc. etc.
>
> The resulting data.frame file should look like
> A			B 		C	D
> AWI-test1	1		i	Arctic	
> AWI-test5	2		r	Arctic
> AWI-tes75	56		z	Arctic
> UFT-2		5		I	Boreal
> UFT56		f		t	Boreal
> UFT356		9j		t	Boreal
> etc. etc.		89		t
>
>
Hi Philipp,
Since you mentioned that you were just starting with R, it might be a 
little optimistic to throw you into the regular expression cage and 
expect you to emerge unscathed. You can do this by constructing a 2 
column matrix or data frame of replacement values:

replacements<-matrix(c("AWI","UFT","Arctic","Boreal"),ncol=2)
replacements
      [,1]  [,2]
[1,] "AWI" "Arctic"
[2,] "UFT" "Boreal"

Then write a function using grep to replace the values:

swapLabels<-function(x,y) {
  for(swaprow in 1:dim(y)[1])
   if(length(grep(y[swaprow,1],x))) return(y[swaprow,2])
  return(NA)
}

Finally, apply the function to the first row of the data frame:

pf.df$D<-unlist(lapply(pf.df[,1],swapLabels,replacements))
pf.df$D
[1] "Arctic" "Arctic" "Arctic" "Boreal" "Boreal" "Boreal"

Jim



More information about the R-help mailing list