[R] Converting dollar value (factors) to numeric

David Winsemius dwinsemius at comcast.net
Thu May 6 20:39:28 CEST 2010


On May 6, 2010, at 2:14 PM, Greg Snow wrote:

> This can be further simplified by combining the 2 subs into a single  
> gsub('[$,]','',as.character(y)).
>
> This will then convert "$123$35,24,,$1$$2,,3.4" into a number when  
> you may have wanted something like that to give a warning and/or NA  
> value.
>
> The g in gsub stands for global (meaning replace every '$' and ','  
> not just the first one) rather than greedy (which has a different  
> meaning in regular expressions).
>
> This discussion brings up a related issue that I have thought about  
> for a while.  In the help for read.table in the section on  
> colClasses it says that you can specify other conversions from  
> character as long as there is a method for as corresponding to what  
> you put in.
>
> This suggests to me the approach of writing a conversion function  
> called something like "as.dollar" then setting  
> colClasses=c('numeric','dollar','dollar','factor') or something like  
> that and having the middle 2 columns run through the function.   
> However my first quick attempt failed (the doc says the method needs  
> to be in the methods package and my quick attempt with setMethod  
> created a local copy).  There is also the possible problem that this  
> would create a column with class dollar when I want a simple numeric.
>
> So this brings up 2 questions:
>
> 1. has anyone found a way to create a method for as in the methods  
> package such that my idea above would work? (preferable without much  
> more work than the post-processing already suggested).


I do get a warning but it does seem to "work" as intended. Basically  
following as best I could suggestion a couple of months ago by Gabor  
Grothendieck. A link to an early post and then a colClass method to  
strip "$" and ","'s:

http://finzi.psych.upenn.edu/Rhelp10/2010-February/229550.html

 > Input <- "$245,000,000\n 3,000.000\n $$$34"

 > setAs("character", "num.with.commas.dolsign",
+     function(from) as.numeric(gsub(",|\\$", "", from)))
Warning message:
In matchSignature(signature, fdef, where) :
   in the method signature for function "coerce" no definition for  
class: “num.with.commas.dolsign”
 > DF <- read.table(textConnection(Input), header = FALSE,
+     colClasses = c("num.with.commas.dolsign"))
 > DF
         V1
1 2.45e+08
2 3.00e+03
3 3.40e+01

 > sprintf("%12.2f", DF$V1)
[1] "245000000.00" "     3000.00" "       34.00"

Any help with cleaning up the S4 incantations would be welcome.

-- 
David.


> 2. If the answer to 1 above is no, are others interested in this  
> type of functionality and we should move the discussion to r-devel  
> as a feature request?
>
> Even nicer would be a simple way to go from a single character  
> vector to multiple columns in the data frame, I remember working  
> with a file once where the 1st 3 columns were comma separated (no  
> spaces), but everything after that was white space separated.  I  
> read it in as whitespace separated, then had to post process the 1st  
> column into 3.  But getting all the semantics of 1 to multiple could  
> be tricky.  That particular case could also have been easier if the  
> sep argument to read.table could be a regular expression, but that  
> would probably slow things down for the simple cases.
>
>
>
> -- 
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of David Winsemius
>> Sent: Thursday, May 06, 2010 4:47 AM
>> To: Wang, Kevin (SYD)
>> Cc: r-help at r-project.org; Phil Spector
>> Subject: Re: [R] Converting dollar value (factors) to numeric
>>
>>
>> On May 5, 2010, at 11:31 PM, Wang, Kevin (SYD) wrote:
>>
>>> Hi Phil and all those who replied,
>>>
>>> Thanks heap!  Yes it worked to a certain extent.  However, if I have
>>> the
>>> following case:
>>>> x <- c("$135,359.00", "$135359.00", "$1,135,359.00")
>>>> y <- sub('\\$','',as.character(x))
>>>> cost <- as.numeric(sub('\\,','',as.character(y)))
>>
>> Try gsub, it seems to be more "greedy" :
>>
>> cost <- as.numeric(gsub('\\,','',as.character(y)))
>>
>> --
>> David
>>> Warning message:
>>> NAs introduced by coercion
>>>> cost
>>> [1] 135359 135359     NA
>>>
>>> Then the third value bcomes NA -- though I suspect it's probably has
>>> something to do with regular expression (which I'm not sure how to
>>> fix)
>>> than R?
>>>
>>> Thanks again for the help!
>>>
>>> Cheers
>>> Kev
>>>
>>> -----Original Message-----
>>> From: Phil Spector [mailto:spector at stat.berkeley.edu]
>>> Sent: Wednesday, 5 May 2010 6:14 PM
>>> To: Wang, Kevin (SYD)
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Converting dollar value (factors) to numeric
>>>
>>> Kev-
>>>  The most reliable way to do the conversion is as follows:
>>>
>>>> x = factor(c('$112.11','$119.15','$121.32'))
>>>> as.numeric(sub('\\$','',as.character(x)))
>>> [1] 112.11 119.15 121.32
>>>
>>> This way negative quantities and numbers without dollar signs are
>>> handled correctly.  There's certainly no need to create a new input
>>> file.
>>>
>>> It may be easier to understand as
>>>
>>> as.numeric(sub('$','',as.character(x),fixed=TRUE))
>>>
>>> which gives the same result.
>>> 					- Phil Spector
>>> 					 Statistical Computing Facility
>>> 					 Department of Statistics
>>> 					 UC Berkeley
>>> 					 spector at stat.berkeley.edu
>>>
>>>
>>> On Wed, 5 May 2010, Wang, Kevin (SYD) wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm trying to read in a bunch of CSV files into R where many  
>>>> columns
>>>> are coded like $111.11.  When reading them in they are treated as
>>> factors.
>>>>
>>>> I'm wondering if there is an easy way to convert them into numeric
>> in
>>>> R (as I don't want to modify the source data)?  I've done some
>>>> searches and can't seem to find an easy way to do this.
>>>>
>>>> I apologise if this is a trivial question, I haven't been using R
>> for
>>>> a while.
>>>>
>>>> Many thanks in advance!
>>>>
>>>> Cheers
>>>>
>>>> Kev
>>>>
>>>> Kevin Wang
>>>>> Senior Advisor, Health and Human Services Practice Government
>>>>> Advisory Services
>>>>>
>>>>> KPMG
>>>>> 10 Shelley Street
>>>>> Sydney  NSW  2000  Australia
>>>>>
>>>>> Tel 	+61 2 9335 8282
>>>>> Fax	+61 2 9335 7001
>>>>>
>>>> kevinwang at kpmg.com.au
>>>>
>>>>> Protect the environment: think before you print
>>>>>
>>>>>
>>>>
>>>>
>>>> 	[[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list