[R] Function for trim blanks from a string(s)?

Marc Schwartz marc_schwartz at comcast.net
Mon Aug 6 22:42:22 CEST 2007


On Mon, 2007-08-06 at 21:23 +0100, Prof Brian Ripley wrote:
> I am sure Marc knows that ?sub has examples of trimming trailing space and 
> whitespace in various styles.

Indeed, though leading spaces are not covered there, so thought that I
would take a minute or two to provide both and the combination of the
two using gsub().

> On Mon, 6 Aug 2007, Marc Schwartz wrote:
> 
> > On Mon, 2007-08-06 at 12:15 -0700, adiamond wrote:
> >> I feel like an idiot posting this because every language I've ever seen has a
> >> string function that trims blanks off strings (off the front or back or
> >> both).
> 
> Some very common languages do not, though.  It is an exercise in Kernighan 
> & Ritchie (the original C reference), and an FAQ entry for Perl.
> 
> >> Ideally, it would process whole data frames/matrices etc but I don't
> >> even see one that processes a single string.  But I've searched and I don't
> >> even see that.  There's a strtrim function but it does something completely
> >> different.
> >
> > If you want to do this while initially importing the data into R using
> > one of the read.table() family of functions, see the 'strip.white'
> > argument in ?read.table, which would do an entire data frame in one
> > call.
> >
> > Otherwise, the easiest way to do it would be to use sub() or gsub()
> > along the lines of the following:
> >
> > # Strip leading space
> > sub("^ +", "", YourTextVector)
> >
> >
> > # Strip trailing space
> > sub(" +$", "", YourTextVector)
> >
> >
> > # Strip both
> > gsub("(^ +)|( +$)", "", YourTextVector)
> >
> >
> >
> >
> > Examples of use:
> >
> >> sub("^ +", "", "   Leading Space")
> > [1] "Leading Space"
> >
> >
> >> sub(" +$", "", "Trailing Space    ")
> > [1] "Trailing Space"
> >
> >
> >> gsub("(^ +)|( +$)", "", "    Leading and Trailing Space    ")
> > [1] "Leading and Trailing Space"
> >
> >
> > See ?sub which also has ?gsub
> >
> > Note that the above will only strip spaces, not all white space.
> >
> > You can then use the appropriate call in one of the *apply() family of
> > functions to loop over columns/rows as may be appropriate.
> 
> Well, arrays are vectors and so can be done by
> 
> A[] <- sub(....., A)
> 
> and data frames with character columns by
> 
> A[] <- lapply(A, function(x) sub(....., x))

Right. One could probably use it on mixed data frames along the lines of
the following (untested):

  A[] <- lapply(A, function(x) ifelse(is.character(x) | is.factor(x),
                                      sub(....., x), x))


And leave out the "| is.factor(x)" if one only wanted character columns
affected.

Thanks,

Marc



More information about the R-help mailing list