[R] a problem of approach

Adrian Duşa dusa.adrian at gmail.com
Wed Jun 27 18:51:12 CEST 2012


Hi Jim,

On Wed, Jun 27, 2012 at 7:27 PM, jim holtman <jholtman at gmail.com> wrote:
> One place to start is to use Rprof to see where time is being spent.
> I used the sample you sent and this is what I got:
>
>
>  0  16.7 root
>  1.   16.2 system.time
>  2. .   16.1 testfoo
>  3. . .   16.1 setdiff
>  4. . . .    8.2 as.vector
>  5. . . . .    8.2 findSubsets
>  6. . . . . .    6.4 increment
>  7. . . . . . .    4.2 as.vector
>  8. . . . . . . .    3.6 outer
>  9. . . . . . . . .    0.3 rep.int
>  7. . . . . . .    1.6 c
>  7. . . . . . .    0.2 max
>  4. . . .    7.9 unique
>  5. . . . .    7.3 match
>  5. . . . .    0.3 unique.default
>  1.    0.5 sort
>  2. .    0.5 standardGeneric
>  3. . .    0.3 sample
>  3. . .    0.2 sort
>  4. . . .    0.2 sort.default
>  5. . . . .    0.2 sort.int
>
> Of the 16.7 seconds to execute the code, 16.1 was taken up in
> 'setdiff'.  Maybe there is some other way you can determine the
> difference.  So if you continue to use 'setdiff', it does not look
> like there is much that can be done.

One thing to notice is that setdiff() is part of the while() loop.

I could in principle loop over the entire vector and eliminate (all)
the derived numbers at the end, but I have a hunch it might take even
longer. The point of setdiff() was to progressively shorten the vector
in order to minimize the time spent in the loop. On the other hand,
setdiff() overwrites the vector at each iteration and that of course
also takes time.

I thought a C program might prove to be faster (because of the faster
looping over each value in the vector), but although it works just
fine it seems I am unable to properly use C, given the similar long
time spent (probably because of toying with the memory too much).

Well, any other quicker alternative would do...
Thanks,
Adrian

-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
       +40 21 3120210 / int.101
Fax: +40 21 3158391



More information about the R-help mailing list