[R] thousand separator (was RE: weight)

Liaw, Andy andy_liaw at merck.com
Tue May 1 03:34:01 CEST 2007


Looks very neat, Gabor!  

I just cannot fathom why anyone who want to write numerics with those
separators in a flat file.   That's usually not for human consumption,
and computers don't need those separators!  

Andy

From: Gabor Grothendieck
> 
> That could be accomplished using a custom class like this:
> 
> library(methods)
> setClass("num.with.junk")
> setAs("character", "num.with.junk",
>    function(from) as.numeric(gsub(",", "", from)))
> 
> 
> ### test ###
> 
> Input <- "A B
> 1,000 1
> 2,000 2
> 3,000 3
> "
> DF <- read.table(textConnection(Input), header = TRUE,
>    colClasses = c("num.with.junk", "numeric"))
> str(DF)
> 
> 
> 
> On 4/30/07, Liaw, Andy <andy_liaw at merck.com> wrote:
> > Still, though, it would be nice to have the data read in 
> correctly in
> > the first place, instead of having to do this kind of 
> post-processing
> > afterwards...
> >
> > Andy
> >
> > From: Bert Gunter
> > >
> > > Nothing! My mistake! gsub -- not sub -- is what you want to
> > > get 'em all.
> > >
> > > -- Bert
> > >
> > >
> > > Bert Gunter
> > > Genentech Nonclinical Statistics
> > >
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
> Marc Schwartz
> > > Sent: Monday, April 30, 2007 10:18 AM
> > > To: Bert Gunter
> > > Cc: r-help at stat.math.ethz.ch
> > > Subject: Re: [R] thousand separator (was RE: weight)
> > >
> > > Bert,
> > >
> > > What am I missing?
> > >
> > > > print(as.numeric(gsub(",", "", "1,123,456.789")), 10)
> > > [1] 1123456.789
> > >
> > >
> > > FWIW, this is using:
> > >
> > > R version 2.5.0 Patched (2007-04-27 r41355)
> > >
> > > Marc
> > >
> > > On Mon, 2007-04-30 at 10:13 -0700, Bert Gunter wrote:
> > > > Except this doesn't work for "1,123,456.789" Marc.
> > > >
> > > > I hesitate to suggest it, but gregexpr() will do it, as it
> > > captures the
> > > > position of **every** match to ",". This could be then used
> > > to process the
> > > > vector via some sort of loop/apply statement.
> > > >
> > > > But I think there **must** be a more elegant way using
> > > regular expressions
> > > > alone, so I, too, await a clever reply.
> > > >
> > > > -- Bert
> > > >
> > > >
> > > > Bert Gunter
> > > > Genentech Nonclinical Statistics
> > > >
> > > > -----Original Message-----
> > > > From: r-help-bounces at stat.math.ethz.ch
> > > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
> Marc Schwartz
> > > > Sent: Monday, April 30, 2007 10:02 AM
> > > > To: Liaw, Andy
> > > > Cc: r-help at stat.math.ethz.ch
> > > > Subject: Re: [R] thousand separator (was RE: weight)
> > > >
> > > > One possibility would be to use something like the following
> > > > post-import:
> > > >
> > > > > WTPP
> > > > [1] 1,106.8250 1,336.5138
> > > >
> > > > > str(WTPP)
> > > >  Factor w/ 2 levels "1,106.8250","1,336.5138": 1 2
> > > >
> > > > > as.numeric(gsub(",", "", WTPP))
> > > > [1] 1106.825 1336.514
> > > >
> > > >
> > > > Essentially strip the ',' characters from the factors and
> > > then coerce
> > > > the resultant character vector to numeric.
> > > >
> > > > HTH,
> > > >
> > > > Marc Schwartz
> > > >
> > > >
> > > > On Mon, 2007-04-30 at 12:26 -0400, Liaw, Andy wrote:
> > > > > I've run into this occasionally.  My current solution is
> > > simply to read
> > > > > it into Excel, re-format the offending column(s) by 
> unchecking the
> > > > > "thousand separator" box, and write it back out.  Not
> > > exactly ideal to
> > > > > say the least.  If anyone can provide a better solution
> > > in R, I'm all
> > > > > ears...
> > > > >
> > > > > Andy
> > > > >
> > > > > From: Natalie O'Toole
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > These are the variables in my file. I think the
> > > variable i'm having
> > > > > > problems with is WTPP which is of the Factor type. Does
> > > > > > anyone know how to
> > > > > > fix this, please?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Nat
> > > > > >
> > > > > > data.frame':   290 obs. of  5 variables:
> > > > > >  $ PROV  : num  48 48 48 48 48 48 48 48 48 48 ...
> > > > > >  $ REGION: num  4 4 4 4 4 4 4 4 4 4 ...
> > > > > >  $ GRADE : num  7 7 7 7 7 7 7 7 7 7 ...
> > > > > >  $ Y_Q10A: num  1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
> > > > > >  $ WTPP  : Factor w/ 1884 levels
> > > > > > "1,106.8250","1,336.5138",..: 1544 67
> > > > > > 1568 40 221 1702 1702 1434 310 310 ...
> > > > > >
> > > > > >
> > > > > > __________________
> > > > > >
> > > > > >
> > > > > >
> > > > > > --- Douglas Bates <bates at stat.wisc.edu> wrote:
> > > > > >
> > > > > > > On 4/28/07, John Kane <jrkrideau at yahoo.ca> wrote:
> > > > > > > > IIRC you have a yes/no smoking variable scored 1/2
> > > > > > > ?
> > > > > > > >
> > > > > > > > It is possibly being read in as a factor not as an
> > > > > > > > integer.
> > > > > > > >
> > > > > > > > try
> > > > > > > >  class(df$smoking.variable)
> > > > > > > > to see .
> > > > > > >
> > > > > > > Good point.  In general I would recommend using
> > > > > > >
> > > > > > > str(df)
> > > > > > >
> > > > > > > to check on the class or storage type of all
> > > > > > > variables in a data frame
> > > > > > > if you are getting unexpected results when
> > > > > > > manipulating it.  That
> > > > > > > function is carefully written to provide a maximum
> > > > > > > of information in a
> > > > > > > minimum of space.
> > > > > >
> > > > > > Yes but I'm an relative newbie at R and didn't realise
> > > > > > that str() would do that.  I always thought it was
> > > > > > some kind of string function.
> > > > > >
> > > > > > Thanks, it makes life much easier.
> > > > > >
> > > > > > > >
> > > > > > > > --- Natalie O'Toole <notoole at mtroyal.ca> wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I'm getting an error message:
> > > > > > > > >
> > > > > > > > > Error in df[, 1:4] * df[, 5] : non-numeric
> > > > > > > argument
> > > > > > > > > to binary operator
> > > > > > > > > In addition: Warning message:
> > > > > > > > > Incompatible methods ("Ops.data.frame",
> > > > > > > > > "Ops.factor") for "*"
> > > > > > > > >
> > > > > > > > > here is my code:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ##reading in the file
> > > > > > > > > happyguys<-read.table("c:/test4.dat",
> > > > > > > header=TRUE,
> > > > > > > > > row.names=1)
> > > > > > > > >
> > > > > > > > > ##subset the file based on Select If
> > > > > > > > >
> > > > > > > > > test<-subset(happyguys, PROV==48 & GRADE == 7  &
> > > > > > > > > Y_Q10A < 9)
> > > > > > > > >
> > > > > > > > > ##sorting the file
> > > > > > > > >
> > > > > > > > > mydata<-test
> > > > > > > > > mydataSorted<-mydata[ order(mydata$Y_Q10A), ]
> > > > > > > > > print(mydataSorted)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ##assigning  a different name to file
> > > > > > > > >
> > > > > > > > > happyguys<-mydataSorted
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ##trying to weight my data
> > > > > > > > >
> > > > > > > > > data.frame<-happyguys
> > > > > > > > > df<-data.frame
> > > > > > > > > df1<-df[, 1:4] * df[, 5]
> > > > > > > > >
> > > > > > > > > ##getting error message here??
> > > > > > > > >
> > > > > > > > > Error in df[, 1:4] * df[, 5] : non-numeric
> > > > > > > argument
> > > > > > > > > to binary operator
> > > > > > > > > In addition: Warning message:
> > > > > > > > > Incompatible methods ("Ops.data.frame",
> > > > > > > > > "Ops.factor") for "*"
> > > > > > > > >
> > > > > > > > > Does anyone know what this error message means?
> > > > > > > > >
> > > > > > > > > I've been reviewing R code all day & getting
> > > > > > > more
> > > > > > > > > familiar with it
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Nat
> > > > > > > > >
> > > >
> > > > ______________________________________________
> > > > R-help at stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, 
> reproducible code.
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
> >
> >
> > 
> --------------------------------------------------------------
> ----------------
> > Notice:  This e-mail message, together with any 
> attachments,...{{dropped}}
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> 


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}



More information about the R-help mailing list