[R] Manipulate Data (with regular expressions)

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jul 11 13:04:46 CEST 2008


strapply() in gsubfn is convenient for that since it matches by contents
rather than delimiters:

x <- factor(c("220", "220a", "221b", "B221", "220a1", "220ab1", "220a12"))

library(gsubfn)
strapply(as.character(x), "[0-9]{3}", simplify = c)

See
http://gsubfn.googlecode.com

On Fri, Jul 11, 2008 at 5:04 AM, Kunzler, Andreas <a.kunzler at bzaek.de> wrote:
> Thank you a lot,
>
> I am almost done, but unfortunately I have to manipulate values like
>
> x
> 220a1
> 220ab1
> 220a12
>
> to
>
> y
> 220
> 220
> 220
>
> Eventhough it is easy to macht a 3-digit number
> [0-9]{3}
> I habe no idea how to mach everything except a 3-digit number in order to replace everything but the 3-digit number by ""
>
> y <- gsub(RE for Everything but a 3-digit number, "", x)
>
> Maybe it ist possible to use the MATCH as the Replacer
>
> y <- gsub([0-9]{3}, MATCH, x)
>
> Thank you
>
> -----Ursprüngliche Nachricht-----
> Von: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
> Gesendet: Dienstag, 8. Juli 2008 17:20
> An: Kunzler, Andreas
> Cc: r-help at r-project.org
> Betreff: Re: [R] Manipulate Data (with regular expressions)
>
> Try this:
>
> x <- factor(c("220", "220a", "221", "221b", "B221"))
> pat <- "[^0-9]+" # match non-digits
> nums <- as.numeric(gsub(pat, "", x))
> has.lets <- as.numeric(regexpr(pat, x) > 0)
>
>
> On Tue, Jul 8, 2008 at 7:11 AM, Kunzler, Andreas <a.kunzler at bzaek.de> wrote:
>> Dear Everyone,
>>
>>
>>
>> I try to automatically manipulate the data of a variable (class =
>> factor) like
>>
>>
>>
>> x
>>
>> 220
>>
>> 220a
>>
>> 221
>>
>> 221b
>>
>> B221
>>
>>
>>
>> Into two variables (class = numeric) like
>>
>>
>>
>> x     y
>>
>> 220   0
>>
>> 220   1
>>
>> 221   0
>>
>> 221   1
>>
>> 221   1
>>
>>
>>
>> y has to carry the information about the class (number or string) of the
>> former x-Variable.
>>
>>
>>
>> I could do it by hand like
>>
>>
>>
>> x[x == "220a"] <- 220
>>
>> y[x == "220a"] <- 1
>>
>>
>>
>> but x has way to many expressions.
>>
>>
>>
>> So I wondered if I could use a regular expression like OR ANY OTHER WAY
>>
>>
>>
>> x[x == [0-9]{3}a] <- regular expression
>>
>> y[x == [0-9]{3}] <- 1
>>
>>
>>
>>
>>
>> Thanks a lot
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>



More information about the R-help mailing list