[R] do.call vs. lapply for lists

Muenchen, Robert A (Bob) muenchen at utk.edu
Mon Apr 9 19:20:59 CEST 2007


Marc,

That makes the difference between do.call and lapply crystal clear. Your
explanation would make a nice FAQ entry.

Thanks!
Bob

=========================================================
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: muenchen at utk.edu
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html
=========================================================


> -----Original Message-----
> From: Marc Schwartz [mailto:marc_schwartz at comcast.net]
> Sent: Monday, April 09, 2007 1:06 PM
> To: Muenchen, Robert A (Bob)
> Cc: R-help at stat.math.ethz.ch
> Subject: Re: do.call vs. lapply for lists
> 
> On Mon, 2007-04-09 at 12:45 -0400, Muenchen, Robert A (Bob) wrote:
> > Hi All,
> >
> > I'm trying to understand the difference between do.call and lapply
> for
> > applying a function to a list. Below is one of the variations of
> > programs (by Marc Schwartz) discussed here recently to select the
> first
> > and last n observations per group.
> >
> > I've looked in several books, the R FAQ and searched the archives,
> but I
> > can't find enough to figure out why lapply doesn't do what do.call
> does
> > in this case. The help files & newsletter descriptions of do.call
> sound
> > like it would do the same thing, but I'm sure that's due to my lack
> of
> > understanding about their specific terminology. I would appreciate
it
> if
> > you could take a moment to enlighten me.
> >
> > Thanks,
> > Bob
> >
> > mydata <- data.frame(
> >   id      = c('001','001','001','002','003','003'),
> >   math    = c(80,75,70,65,65,70),
> >   reading = c(65,70,88,NA,90,NA)
> > )
> > mydata
> >
> > mylast <- lapply( split(mydata,mydata$id), tail, n=1)
> > mylast
> > class(mylast) #It's a list, so lapply will so *something* with it.
> >
> > #This gets the desired result:
> > do.call("rbind", mylast)
> >
> > #This doesn't do the same thing, which confuses me:
> > lapply(mylast,rbind)
> >
> > #...and data.frame won't fix it as I've seen it do in other
> > circumstances:
> > data.frame( lapply(mylast,rbind) )
> 
> Bob,
> 
> A key difference is that do.call() operates (in the above example) as
> if
> the actual call was:
> 
> > rbind(mylast[[1]], mylast[[2]], mylast[[3]])
>    id math reading
> 3 001   70      88
> 4 002   65      NA
> 6 003   70      NA
> 
> In other words, do.call() takes the quoted function and passes the
list
> object as if it was a list of individual arguments. So rbind() is only
> called once.
> 
> In this case, rbind() internally handles all of the factor level
> issues,
> etc. to enable a single common data frame to be created from the three
> independent data frames contained in 'mylast':
> 
> > str(mylast)
> List of 3
>  $ 001:'data.frame':    1 obs. of  3 variables:
>   ..$ id     : Factor w/ 3 levels "001","002","003": 1
>   ..$ math   : num 70
>   ..$ reading: num 88
>  $ 002:'data.frame':    1 obs. of  3 variables:
>   ..$ id     : Factor w/ 3 levels "001","002","003": 2
>   ..$ math   : num 65
>   ..$ reading: num NA
>  $ 003:'data.frame':    1 obs. of  3 variables:
>   ..$ id     : Factor w/ 3 levels "001","002","003": 3
>   ..$ math   : num 70
>   ..$ reading: num NA
> 
> 
> On the other hand, lapply() (as above) calls rbind() _separately_ for
> each component of mylast.  It therefore acts as if the following
series
> of three separate calls were made:
> 
> 
> > rbind(mylast[[1]])
>    id math reading
> 3 001   70      88
> 
> > rbind(mylast[[2]])
>    id math reading
> 4 002   65      NA
> 
> > rbind(mylast[[3]])
>    id math reading
> 6 003   70      NA
> 
> 
> Of course, the result of lapply() is that the above are combined into
a
> single R list object and returned:
> 
> > lapply(mylast, rbind)
> $`001`
>    id math reading
> 3 001   70      88
> 
> $`002`
>    id math reading
> 4 002   65      NA
> 
> $`003`
>    id math reading
> 6 003   70      NA
> 
> 
> It is a subtle, but of course critical, difference in how the internal
> function is called and how the arguments are passed.
> 
> Does that help?
> 
> Regards,
> 
> Marc Schwartz
>



More information about the R-help mailing list