[Rd] Is it safe not to coerce matrices with as.double() in .C()?

Simon Urbanek simon.urbanek at r-project.org
Fri Sep 17 21:18:13 CEST 2010


On Sep 17, 2010, at 1:22 PM, Liaw, Andy wrote:

> From: Liaw, Andy
>> 
>> From: Prof Brian Ripley
>>> 
>>> On Fri, 27 Aug 2010, peter dalgaard wrote:
>>> 
>>>> 
>>>> On Aug 27, 2010, at 2:44 PM, Liaw, Andy wrote:
>>>> 
>>>>> I'd very much appreciate guidance on this.  A user 
>>> reported that the
>>>>> as.double() coercion used inside the .C() call for a 
>> function in my
>>>>> package (specifically, randomForest:::predict.randomForest()) is
>>>>> taking up significant amount of time when called repeatedly, and
>>>>> Removing some of these reduced run time by 30-40% in some cases.
>>>>> These arguments are components of the fitted model (thus do not
>>>>> change), and are matrices.  Some basic tests show no 
>> difference in
>>>>> The result when the coercions are removed (other than 
>>> faster run time).
>>>>> What I like to know is whether this is safe to do, or is 
>>> it likely to
>>>>> lead
>>>>> to trouble in the future?
>>>> 
>>>> In a word: yes. It is safe as long as you are absolutely 
>> sure that 
>>>> the argument has the right mode. The unsafeness comes in 
>>> when people 
>>>> might unwittingly use, say, an integer vector where a double was 
>>>> expected, causing memory overruns and general mayhem.
>>>> 
>>>> Notice, BTW, that if you switch to .Call or .External, then 
>>> you have 
>>>> much more scope for handling such details on the C-side. E.g. you 
>>>> could coerce only if the object has the wrong mode, avoid 
>>>> duplicating things you won't be modifying anyway, etc.
>>> 
>>> But as as.double is effectively .Call it has the same 
>> freedom, and it 
>>> does nothing if no coercion is required.  The crunch here is 
>>> likely to 
>>> be
>>> 
>>>      ‘as.double’ attempts to coerce its argument to be of 
>>> double type:
>>>      like ‘as.vector’ it strips attributes including names.  
>>> (To ensure
>>>      that an object is of double type without stripping 
>>> attributes, use
>>>      ‘storage.mode’.)
>>> 
>>> I suspect the issue is the copying to remove attributes, in 
>> which case
>> 
>> I can certainly believe this.  I've tried replacing 
>> as.double() to c(), thinking attributes need to be stripped.  
>> That actually increased run time very slightly instead of reducing it.
>> 
>>> storage.mode(x) <- "double"
>>> 
>>> should be a null op and so both fast and safe.
>> 
>> Will follow this advise.  Thanks to both of you for the help!
> 
> My apologies for coming back to this so late.  I did some test, and found that
> 
>  storage.mode(x) <- "double"
> 
> isn't as low on resource as I thought it might be.  Changing the code to this from
> 
>  x <- as.double(x)
> 
> did not give the expected speed improvement.  Here's a little test:
> 
> f1 <- function(x) { as.double(x); NULL }
> f2 <- function(x) { storage.mode(x) <- "double"; NULL }
> f3 <- function(x) { x <- x; NULL }
> set.seed(917)
> reps <- 500
> x <- matrix(rnorm(1e6L), 1e3L, 1e3L)
> system.time(junk <- replicate(reps, f1(x)))
> system.time(junk <- replicate(reps, f2(x)))
> system.time(junk <- replicate(reps, f3(x)))
> 
> On my laptop running R  2.11.1 Patched (2010-06-26 r52410), I get:
> 
> R> system.time(junk <- replicate(reps, f1(x)))
>   user  system elapsed 
>   3.54    2.14    5.74 
> R> system.time(junk <- replicate(reps, f2(x)))
>   user  system elapsed 
>   3.32    2.11    5.92 
> R> system.time(junk <- replicate(reps, f3(x)))
>   user  system elapsed 
>      0       0       0 
> 
> Perhaps I need to first check and see if the storage mode is as expected before trying coercion?
> 

Well, the devil is in the details. Although storage.mode<- is a noop itself, the issue is that it does trigger duplication because it is an assignment, not because storage mode would change anything. Technically, x <- x is a special case which is truly a noop whereas any call `foo<-` has to assume modification. So, yes, in your case 
f4 <- function(x) { if (storage.mode(x) != "double") storage.mode(x) <- "double"; NULL }
will have the same speed as f3. If you are going in to .Call then you could as well do that in the C side (with the benefit of being able to strip attributes since you can get them from the original object if you care...).

Cheers,
Simon



> Best,
> Andy
> 
> 
> 
>> Best,
>> Andy
>> 
>> 
>>> -- 
>>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>>> University of Oxford,             Tel:  +44 1865 272861 (self)
>>> 1 South Parks Road,                     +44 1865 272866 (PA)
>>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>> 
>> Notice:  This e-mail message, together with any attachments, contains
>> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
>> New Jersey, USA 08889), and/or its affiliates Direct contact 
>> information
>> for affiliates is available at 
>> http://www.merck.com/contact/contacts.html) that may be confidential,
>> proprietary copyrighted and/or legally privileged. It is 
>> intended solely
>> for the use of the individual or entity named on this 
>> message. If you are
>> not the intended recipient, and have received this message in error,
>> please notify us immediately by reply e-mail and then delete it from 
>> your system.
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> Notice:  This e-mail message, together with any attach...{{dropped:18}}



More information about the R-devel mailing list