[R] function using values separated by a comma

Gabor Grothendieck ggrothendieck at gmail.com
Fri Oct 8 16:38:49 CEST 2010


On Fri, Oct 8, 2010 at 10:18 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> On Fri, Oct 8, 2010 at 1:19 AM, burgundy <sauburn at yahoo.com> wrote:
>>
>> Hello,
>>
>> I have a dataframe (tab separated file) which looks like the example below -
>> two values separated by a comma, and tab separation between each of these.
>>
>>     [,1]  [,2]  [,3]  [ ,4]
>> [1,] 0,1  1,3   40,10  0,0
>> [2,] 20,5  4,2  10,40  10,0
>> [3,] 0,11  1,2  120,10  0,0
>>
>> I would like to calculate the percentage of the smallest number separated by
>> the comma by:
>> 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
>> 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50
>> = 0.8
>> 3) where the value generated by 2) is >0.5, print 1-value, otherwise, leave
>> value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2
>>
>> plan to generate file like:
>>
>>    [,1]  [,2]  [,3]  [,4]
>> [1,] 1   0.25  0.2  0
>> [2,] 0.2  0.33  0.2  1
>> [3,] 1  0.33  0.08  0
>
> Try using gsubfn in gsubfn (http://gsubfn.googlecode.com).  Using that
> match a regular expression consisting of digits, a comma and digits
> capturing the two strings of digits and passing them to function f
> replacing the expression with the output of f.  Then read the
> resulting text into a data frame.
>
> library(gsubfn)
> L <- c(" 0,1  1,3   40,10  0,0", " 20,5  4,2  10,40  10,0",
>   " 0,11  1,2  120,10  0,0")
>
> f <- function(a, b) { x <- as.numeric(c(a, b)); min(x)/sum(x) }
> L2 <- gsubfn("(\\d+),(\\d+)", f, L)
>
> DF <- read.table(textConnection(L2))
>
> which gives:
>
>> DF
>   V1        V2         V3  V4
> 1 0.0 0.2500000 0.20000000 NaN
> 2 0.2 0.3333333 0.20000000   0
> 3 0.0 0.3333333 0.07692308 NaN

A further simplification would be to use strapply from the same
package. It eliminates the need for read.table at the end:

> strapply(L, "(\\d+),(\\d+)", f, simplify = rbind)
     [,1]      [,2]       [,3] [,4]
[1,]  0.0 0.2500000 0.20000000  NaN
[2,]  0.2 0.3333333 0.20000000    0
[3,]  0.0 0.3333333 0.07692308  NaN

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list