[R] [FORGED] function for remove white space

William Dunlap wdunlap at tibco.com
Wed Feb 22 17:49:24 CET 2017


Try the following function to apply gsub to all character or factor
columns of a data.frame (and maintain change the class of all
columns):

gsubDataFrame <- function(pattern, replacement, x, ...) {
    stopifnot(is.data.frame(x))
    for(i in seq_len(ncol(x))) {
        if (is.character(x[[i]])) {
            x[[i]] <- gsub(pattern, replacement, x[[i]], ...)
        } else if (is.factor(x[[i]])) {
            levels(x[[i]]) <- gsub(pattern, replacement, levels(x[[i]]), ...)
        } # else do nothing for numeric or other column types
    }
    x
}

E.g.,
> d <- data.frame(stringsAsFactors = FALSE,
+                 Int=1:5,
+                 Char=c("a a", "baa", "a a ", " aa", "b a a"),
+                 Fac=factor(c("x x", "yxx", "x x ", " xx", "y x x")))
> str(d)
'data.frame':   5 obs. of  3 variables:
 $ Int : int  1 2 3 4 5
 $ Char: chr  "a a" "baa" "a a " " aa" ...
 $ Fac : Factor w/ 5 levels " xx","x x","x x ",..: 2 5 3 1 4
> str(gsubDataFrame(" ", "", d)) # delete spaces, use "[[:space:]]" for whitespace
'data.frame':   5 obs. of  3 variables:
 $ Int : int  1 2 3 4 5
 $ Char: chr  "aa" "baa" "aa" "aa" ...
 $ Fac : Factor w/ 2 levels "xx","yxx": 1 2 1 1 2

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Feb 21, 2017 at 11:35 PM, José Luis <josestadistico at gmail.com> wrote:
> Thank's for your answer.
>
> I'm using read.csv.
>
> Enviado desde mi iPad
>
>> El 22/2/2017, a las 3:39, William Michels <wjm1 at caa.columbia.edu> escribió:
>>
>> Hi José (and Rolf),
>>
>> It's not entirely clear what type of 'whitespace' you're referring to,
>> but if you're using read.table() or read.csv() to create your
>> dataframe in the first place, setting 'strip.white = TRUE' will remove
>> leading and trailing whitespace 'from unquoted character fields
>> (numeric fields are always stripped).'
>>
>>> ?read.table
>>> ?read.csv
>>
>> Cheers,
>>
>> Bill
>>
>>
>>> On 2/21/17, Rolf Turner <r.turner at auckland.ac.nz> wrote:
>>>> On 22/02/17 12:51, José Luis Aguilar wrote:
>>>> Hi all,
>>>>
>>>> i have a dataframe with 34 columns and 1534 observations.
>>>>
>>>> In some columns I have strings with spaces, i want remove the space.
>>>> Is there a function that removes whitespace from the entire dataframe?
>>>> I use gsub but I would need some function to automate this.
>>>
>>> Something like
>>>
>>> X <- as.data.frame(lapply(X,function(x){gsub(" ","",x)}))
>>>
>>> Untested, since you provide no reproducible example (despite being told
>>> by the posting guide to do so).
>>>
>>> I do not know what my idea will do to numeric columns or to factors.
>>>
>>> However it should give you at least a start.
>>>
>>> cheers,
>>>
>>> Rolf Turner
>>>
>>> --
>>> Technical Editor ANZJS
>>> Department of Statistics
>>> University of Auckland
>>> Phone: +64-9-373-7599 ext. 88276
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list