[R] efficiently replacing values in a matrix

Joerg van den Hoff j.van_den_hoff at fzd.de
Thu Apr 17 13:41:38 CEST 2008


On Wed, Apr 16, 2008 at 03:56:26PM -0600, Matthew Keller wrote:
> Yes Chuck, you're right.
> 

just a comment:

> Thanks for the help. It was a data.frame not a matrix (I had called
> as.matrix() in my script much earlier but that line of code didn't run
> because I misnamed the object!). My bad. Thanks for the help. And I'm
> VERY relieved R isn't that inefficient...

well,  it _is_ at least when using data frames. and while it
is obvious that operations on lists (data frames  are  lists
in   disguise,   actually,   right?)   are  slower  than  on
arrays/matrices, I'm not happy with a performance drop by  a
factor of about seemlingy >  1500 (30 sec vs. > 13 h) -- and
I have seen similar things even with rather small data sets,
where  the  difference  of using data frame vs. matrix might
mean, e.g. overall run times of 10 sec. vs. 0.1 sec. 

where  is  all  this  time  burned?  there  _are_ functional
languages which operate efficiently on lists.

I  think  these  extreme  performance  drop  when  using  an
apparently innocent data structure is really bad.  and  it's
bad,  that  it's not repeatedly stated in BIG LETTERS in the
manuals: use matrices, at least for  big  arrays,  whereever
possible.  this  message  is  not  at  all tranferred by the
"description" in data.frame manpage, e.g.:

"This   function   creates   data  frames,  tightly  coupled
collections of variables which share many of the  properties
of  matrices  and  of  lists,  used  as the fundamental data
structure by most of R's modeling software."...

probably 90% (+ x) of all R users are simply that: users and
not experts. when I started using R I exclusively used  data
frames  for purely numerical data instead of matrices simply
because I could get column n with x[n] instead of x[,n]  and
mean(x)  worked  columnwise  (whereas apply(x, 2, 'mean') is
tiresome) thus saving some typing. this is no strong  reason
in  retrospect but probably quite common. and many then will
stick with data.frames and endure long runtimes for now good
reason at all.

another  question  would  be whether homogeneous data frames
could not internally be handled as matrices...

joerg

> 
> Matt
> 
> 
> On Wed, Apr 16, 2008 at 3:39 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote:
> >
> >  On 17/04/2008, at 9:33 AM, Charles C. Berry wrote:
> >
> >         <snip>
> >
> >
> >
> > > I'll lay odds that Matthew's 'matrix' is actually a data.frame, and I'll
> > not be surprised if the columns are factors.
> > >
> >
> >         <snip>
> >
> >  I suspect that you're right.
> >
> >  ***Why*** can't people distinguish between data frames and matrices?
> >  If they were the same <expletive deleted> thing, there wouldn't be two
> >  different terms for them, would there?
> >
> >         cheers,
> >
> >                 Rolf Turner
> >
> >  ######################################################################
> >  Attention:This e-mail message is privileged and confidential. If you are
> > not theintended recipient please delete the message and notify the
> > sender.Any views or opinions presented are solely those of the author.
> >
> >
> >
> >  This e-mail has been scanned and cleared by
> > MailMarshalwww.marshalsoftware.com
> >  ######################################################################
> >
> 
> 
> 
> -- 
> Matthew C Keller
> Asst. Professor of Psychology
> University of Colorado at Boulder
> www.matthewckeller.com
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list