[R] How to calculate values with percent sign imported from Excel?

David Winsemius dwinsemius at comcast.net
Thu Jun 21 15:19:38 CEST 2012


On Jun 20, 2012, at 10:34 PM, C W wrote:

> I am a noob. I am familiar with factors, but not familiar with how  
> that relates to "two distinct values".  How were you able to tell?
>
> Please point me out.
> Mike

The first part of a factor's structure is a vector of integers, the  
part I copied. The second part, the .Label's is a vector of character  
class. The value of the factor is the 'n'-th item in the character  
vector where 'n' is the integer in the first part. I noted that you  
only had two unique values, 78 and 1.

What you should have done was convert the Excel column to numeric from  
percentage using the "Format/Cell" menu and then import.

>
> On Wed, Jun 20, 2012 at 9:53 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
> Dear Conventional Wisdom;
>
> You do realize you only have two distinct values in that factor  
> variable, right?
>
> dat <- structure( c(78L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)....

After assigning your structure to 'dat':

 > levels(dat)[78]
[1] "7.14%"

 > levels(dat)[1]
[1] ""

You have 9 values of the empty string and one value of 7.14%.

You should start your journey to understanding factors by reading the  
FAQ entry on converting factors to numbers. This problem happens to be  
more complex because R has no 'percentage' type, so there is no  
as.numeric.percent coercion function for vectors of class factor or  
class character, although it would not be that difficult to construct  
one.

 > setClass("percent", representation(a="factor")  )
[1] "percent"
 > setAs("percent", "numeric",  function(from) as.numeric(sub("%", "",  
as.character(from)))/100)
 > class(dat) <- c("percent", class(dat))
 > class(dat)
[1] "percent" "factor"
 > as(dat, "numeric")
  [1] 0.0714     NA     NA     NA     NA     NA     NA     NA      
NA     NA

-- 
David.
>
> -- David.
>
>
> On Jun 20, 2012, at 9:26 PM, C W wrote:
>
> Hi R list,
> I imported values from Excel, there is a column with numbers like  
> 45%, 65%,
> 12%.
>
> I want to find its mean.  What should I use?
>
> strisplit()
> split()
> parse()
>
> Data from dput(),
>
> structure(c(78L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
>
> "-0.15%", "-0.34%", "-1.3%", "-10.77%", "-100.00%", "-11.45%",
>
> "-12.53%", "-13.06%", "-15.36%", "-15.82%", "-16.96%", "-18.71%",
>
> "-2.02%", "-2.94%", "-21.23%", "-25.00%", "-26.20%", "-29.79%",
>
> "-3.16%", "-3.67%", "-30.52%", "-33.44%", "-37.48%", "-37.89%",
>
> "-39.42%", "-45.88%", "-5.09%", "-51.64%", "-61.58%", "-62.87%",
>
> "-63.51%", "-7.00%", "-7.90%", "-8.33%", "-8.58%", "-8.88%",
>
> "-91.10%", "-94.08%", "-96.01%", "0.98%", "10.00%", "10.04%",
>
> "10.64%", "11.11%", "114.32%", "12.09%", "12.68%", "13.77%",
>
> "14.10%", "15.51%", "16.25%", "16.93%", "16.94%", "18.57%", "18.88%",
>
> "2.46%", "2.55%", "2.79%", "2.93%", "20.00%", "22.67%", "24.50%",
>
> "25.76%", "28.18%", "3.26%", "3.80%", "3.83%", "36.05%", "37.22%",
>
> "40.63%", "5.53%", "5.70%", "6.19%", "6.62%", "6.72%", "63.33%",
>
> "7.14%", "7.21%", "7.39%", "9.15%", "9.99%", "95.00%"), class =  
> "factor")
>
> Thanks,
>
> Mike
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list