[R] Coercion of percentages by as.numeric

Gabor Grothendieck ggrothendieck at gmail.com
Mon Nov 14 18:36:13 CET 2005


On 11/14/05, Brandt, T. (Tobias) <TobiasBr at taquanta.com> wrote:
>
>
>
>
> >-----Original Message-----
> >From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
> >Sent: 14 November 2005 06:21 PM
> >
> >On 11/14/05, Brandt, T. (Tobias) <TobiasBr at taquanta.com> wrote:
> >> Hi
> >>
> >> Given that things like the following work
> >>
> >>  > a <- c("-.1"," 2.7 ","B")
> >> > a
> >> [1] "-.1"   " 2.7 " "B"
> >> > as.numeric(a)
> >> [1] -0.1  2.7   NA
> >> Warning message:
> >> NAs introduced by coercion
> >> >
> >>
> >> I naively expected that the following would behave differently.
> >>
> >>  > b <- c('10%', '-20%', '30.0%', '.40%')
> >> > b
> >> [1] "10%"   "-20%"  "30.0%" ".40%"
> >> > as.numeric(b)
> >> [1] NA NA NA NA
> >> Warning message:
> >> NAs introduced by coercion
> >
> >Try this:
> >
> >as.numeric(sub("%", "e-2", b))
> >
>
> Thank you, that accomplishes what I had intended.
>
> I would have thought though that the expression "53%" would be a fairly
> standard representation of the number 0.53 and might be handled as such.  Is
> there a specific reason for avoiding this behaviour?
>
> I can imagine that it might add unnecessary overhead to routines like
> "as.numeric" which one would like to keep as fast as possible.
>
> Perhaps there are other areas though where it might be desirable?  For
> example I'm thinking of the read.table function for reading in csv files
> since I have many of these that have been saved from excel and now contain
> numbers in the "%" format.

Assuming a .csv file with trailing percents after some numbers
you could try this:

Lines <- readLines(myfile)
Lines <- gsub("%", "e-2", Lines)
mydata <- read.csv(textConnection(Lines))




More information about the R-help mailing list