[R] Very slow using S4 classes

André Rossi alrossi at icmc.usp.br
Mon Sep 12 16:20:45 CEST 2011


Dear Martin Morgan and Martin Maechler...

Here is an example of the computational time when a slot of a S4 class is of
another S4 class and when it is just one object. I'm sending you the data
file.

Thank you!

Best regards,

André Rossi

############################################################

setClass("SupervisedExample",
    representation(
        attr.value = "ANY",
        target.value = "ANY"
))

setClass("StreamBuffer",
    representation=representation(
        examples = "list", #SupervisedExample
        max.length = "integer"
    ),
    prototype=list(
            max.length = as.integer(10000)
    )
)

b <- new("StreamBuffer")

load("~/Dropbox/dataList2.RData")

b at examples <- data #data is a list of SupervisedExample class.

> system.time({for (i in 1:100) b at examples[[1]]@attr.value[1] = 2 })
   user  system elapsed
 16.837   0.108  18.244

> system.time({for (i in 1:100) data[[1]]@attr.value[1] = 2 })
   user  system elapsed
  0.024   0.000   0.026

############################################################


2011/9/10 Martin Morgan <mtmorgan at fhcrc.org>

> On 09/10/2011 08:08 AM, André Rossi wrote:
>
>> Hi everybody!
>>
>> I'm creating an object of a S4 class that has two slots: ListExamples,
>> which
>> is a list, and idx, which is an integer (as the code below).
>>
>> Then, I read a data.frame file with 10000 (ten thousands) of lines and 10
>> columns, do some pre-processing and, basically, I store each line as an
>> element of a list in the slot ListExamples of the S4 object. However, many
>> operations after this take a considerable time.
>>
>> Can anyone explain me why dois it happen? Is it possible to speed up an
>> script that deals with a big number of data (it might be data.frame or
>> list)?
>>
>> Thank you,
>>
>> André Rossi
>>
>> setClass("Buffer",
>>     representation=representation(
>>         Listexamples = "list",
>>         idx = "integer"
>>     )
>> )
>>
>
> Hi André,
>
> Can you provide a simpler and more reproducible example, for instance
>
> > setClass("Buf", representation=representation(**lst="list"))
> [1] "Buf"
> > b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE))
> > system.time({ b at lst[[1]][[1]] = 2 })
>   user  system elapsed
>  0.005   0.000   0.005
>
> Generally it sounds like you're modeling the rows as elements of
> Listofelements, but you're better served by modeling the columns (lst =
> replicate(10, integer(10000)), if all of your 10 columns were
> integer-valued, for instance). Also, S4 is providing some measure of type
> safety, and you're undermining that by having your class contain a 'list'.
> I'd go after
>
> setClass("Buffer",
>         representation=representation(
>           col1="integer",
>           col2="character",
>           col3="numeric"
>           ## etc.
>           ),
>         validity=function(object) {
>             nms <- slotNames(object)
>             len <- sapply(nms, function(nm) length(slot(object, nm)))
>             if (1L != length(unique(len)))
>                 "slots must all be of same length"
>             else TRUE
>         })
>
> Buffer <-
>    function(col1, col2, col3, ...)
> {
>    new("Buffer", col1=col1, col2=col2, col3=col3, ...)
> }
>
> Let's see where the inefficiencies are before deciding that this is an S4
> issue.
>
> Martin
>
>
>
>>        [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ______________________________**________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>


More information about the R-help mailing list