[R] Trim trailng space from data.frame factor variables

Marc Schwartz marc_schwartz at comcast.net
Thu Aug 16 18:08:42 CEST 2007


On Thu, 2007-08-16 at 17:54 +0300, Lauri Nikkinen wrote:
> Hi folks,
> 
> I would like to trim the trailing spaces in my factor variables using lapply
> (described in this post by Marc Schwartz:
> http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22826.html) but the code is
> not functioning (in this example there is only one factor with trailing
> spaces):

Ayep....as I noted in that post, it was untested....my error.

The problem is that by using ifelse() as I did, the test for the column
being a factor returns a single result, not one result per element.
Hence, the appropriate conditional code is only performed on the first
element in each column, rather than being vectorized on the entire
column.

> y1 <- rnorm(20) + 6.8
> y2 <- rnorm(20) + (1:20*1.7 + 1)
> y3 <- rnorm(20) + (1:20*6.7 + 3.7)
> y <- c(y1,y2,y3)
> x <- gl(5,12)
> f <- gl(3,20, labels=paste("lev", 1:3, "   ", sep=""))
> d <- data.frame(x=x,y=y, f=f)
> str(d)
> 
> d[] <- lapply(d, function(x) ifelse(is.factor(x), sub(" +$", "", x), x))
> str(d)
> 
> How should I modify this?

Try this instead:

d[] <- lapply(d, function(x) if (is.factor(x)) sub(" +$", "", x) else x)

> str(d)
'data.frame':	60 obs. of  3 variables:
 $ x: chr  "1" "1" "1" "1" ...
 $ y: num  6.70 4.42 8.03 4.90 6.98 ...
 $ f: chr  "lev1" "lev1" "lev1" "lev1" ...

Note that by using grep(), the factors are coerced to character vectors
as expected. You would need to coerce back to factors if you need them
as such.

HTH,

Marc Schwartz



More information about the R-help mailing list