[R] To List or Not To List

Fri May 17 01:32:11 CEST 2013

You can use an environment with much of the same syntax that you use a list;
in particular assign() and get() are not needed because you can use env[["name"]]
instead of assign("name", envir=env) and get("name", envir=env).  You are using
quantmod so I assume you are using getSymbols() to retrieve the data.  Use its
env= argument to put the results into an environment.  Then you have all your stock
data in one place and objects(env) will list the names of things in it.  Loop over every
object in the environment with eapply().  You don't waste any time 'collecting all
the data in a list' because you create in an environment in the first place.
(It has to go in some environment, so put it in a convenient one - it will help you keep
organized.)

> stocksEnv <- new.env()
> getSymbols(c("TIBX", "IBM", "M"), env=stocksEnv)
[1] "TIBX" "IBM"  "M"   
> objects(stocksEnv)
[1] "IBM"  "M"    "TIBX"
> stocksEnv[["TIBX"]][1:5, 4]
           TIBX.Close
2007-01-03       9.66
2007-01-04       9.68
2007-01-05       9.68
2007-01-08       9.89
2007-01-09       9.69
> eapply(stocksEnv, function(obj)tail(obj))
$TIBX
           TIBX.Open TIBX.High TIBX.Low TIBX.Close TIBX.Volume TIBX.Adjusted
2013-05-08     20.62     21.17    20.48      21.16     6371000         21.16
2013-05-09     21.10     21.15    20.57      20.65     2322300         20.65
2013-05-10     20.70     21.02    20.68      20.89     1746500         20.89
2013-05-13     20.89     20.95    20.56      20.61     1114500         20.61
2013-05-14     20.60     20.97    20.55      20.82     1176700         20.82
2013-05-15     20.61     21.09    20.61      21.07     2775500         21.07

$IBM
           IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume IBM.Adjusted
2013-05-08   202.94   204.85  202.51    204.82    3601700       204.82
2013-05-09   204.69   205.00  202.72    203.24    3542300       203.24
2013-05-10   203.37   204.53  202.82    204.47    3279200       204.47
2013-05-13   204.18   204.47  202.22    202.47    3648400       202.47
2013-05-14   202.09   203.67  202.08    203.21    3699700       203.21
2013-05-15   202.25   203.68  202.04    203.32    4028100       203.32

$M
           M.Open M.High M.Low M.Close M.Volume M.Adjusted
2013-05-08  46.54  47.12 46.26   46.64  3440600      46.64
2013-05-09  46.59  46.81 46.26   46.45  2676800      46.45
2013-05-10  46.62  47.23 46.58   47.23  2999300      47.23
2013-05-13  47.02  47.12 46.54   46.88  2687700      46.88
2013-05-14  47.00  47.67 46.92   47.39  4491000      47.39
2013-05-15  48.25  48.93 47.30   48.57  9460700      48.57

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Sparks, John James
> Sent: Thursday, May 16, 2013 10:14 AM
> To: r-help at r-project.org
> Subject: [R] To List or Not To List
> 
> Dear R Helpers,
> 
> A few weeks ago I asked for some help on how to accomplish modifications
> to data in a set of data frames.  As part of that request I mentioned that
> I realized that one way to accomplish my goal was to put the data frames
> together in a list but that I was looking for a way to do it with data
> frames and a loop because I "believe the better thing is to work df by df
> for my particular situation".
> 
> A couple of posters asked me to provide more detail as to what is it about
> my situation that made data frame alterations in a loop more appropriate
> vs. a list.
> 
> Life and the scoring of many exams intervened in the last several days,
> but with grades filed I am now able to return to this issue.
> 
> First, let me provide some particulars regarding my situation.  I am
> working with 5,863 data frames, each with 7 columns and between 5,686 and
> 21 rows of data.  Each data frame contains the daily stock price history
> for an equity traded on one of the U.S. markets.  I wanted to get an
> historical price change for each of the days on the file.  If one were
> working with a single data from for IBM then the command is
> 
> if(nrow(IBM)>129){IBM$Mo129<-ROC(IBM[,"Close"],n=129)}
> 
> to get the Rate Of Change of the stock price relative to 129 trading days
> ago.  This function is in the TTR library which is called by quantmod.
> 
> So it strikes me that in one sense this is a simple fixed costs vs.
> variable costs question:  Is it worth it to assemble the data frames into
> a list and then process them, putatively more quickly than going data
> frame by data frame, which does not require the up-front assembly.
> 
> A look at the empirical results shows executing this set of functions df
> by df consumes 44.15 of elapsed time.
> 
> > ptm <- proc.time()
> >
> >
> > 	ROCFunc<-function(DF){
> + if(nrow(DF)>129){DF$Mo129<-ROC(DF[,"Close"],n=129)}
> + if(nrow(DF)> 65){DF$Mo65 <-ROC(DF[,"Close"],n= 65)}
> + if(nrow(DF)> 21){DF$Mo21 <-ROC(DF[,"Close"],n= 21)}
> + if(nrow(DF)> 10){DF$Mo10 <-ROC(DF[,"Close"],n= 10)}
> + if(nrow(DF)>  5){DF$Mo5  <-ROC(DF[,"Close"],n=  5)}
> + return(DF)
> + }
> > for(i in symbols) assign( i, ROCFunc(get(i)))
> >
> >
> > time<-proc.time() - ptm
> > time
>    user  system elapsed
>   43.52    0.58   44.15
> 
> 
> Using a list approach, the assembly of the list requires 8.44 and then the
> processing requires 39.20 totaling 47.64.  So a slight win for the data
> frame approach. [Continued]
> 
> > ptm <- proc.time()
> >
> > list.object <- quote(list())
> > list.object[ symbols ] <- lapply( symbols, as.name )
> > biglist<-eval(list.object)
> >
> >
> > for (i in seq_along(biglist))
> + 	{
> + 	 biglist[[i]]<-subset(biglist[[i]],select=-c(Open,High,Low))
> + 	 #biglist[[i]]<-biglist[[i]][as.character(biglist[[i]]$Index) >
> "2007-01-01", ]
> + 	 #biglist[[i]]$Index<- as.Date(biglist[[i]]$Index,format="%Y-%m-%d")
> + 	 #biglist[[i]]<-xts(biglist[[i]][,-1],biglist[[i]][,1])
> + 	 #biglist[[i]]<-biglist[[i]]['2005-01-01/']
> + 	 }
> >
> >  proc.time() - ptm
>    user  system elapsed
>    8.03    0.40    8.44
> >  ptm <- proc.time()
> >
> > rm(list=ls(pattern="^[A-Z]"))
> >
> > for (i in seq_along(biglist))
> + {
> + 	 if(nrow(biglist[[i]])>180)
> + 		{
> + 		biglist[[i]][["Mo180"]]<-ROC(biglist[[i]][["Close"]],n=129)
> + 		}
> + 	if(nrow(biglist[[i]])>90)
> + 		{
> + 		biglist[[i]][["Mo90"]] <-ROC(biglist[[i]][["Close"]],n=65)
> + 		}
> + 	if(nrow(biglist[[i]])>30)
> + 		{
> + 		biglist[[i]][["Mo30"]] <-ROC(biglist[[i]][["Close"]],n=21)
> + 		}
> + 	if(nrow(biglist[[i]])>10)
> + 		{
> + 		biglist[[i]][["Mo10"]] <-ROC(biglist[[i]][["Close"]],n=10)
> + 		}
> + 		if(nrow(biglist[[i]])>5)
> + 		{
> + 		biglist[[i]][["Mo5"]] <-ROC(biglist[[i]][["Close"]],n=5)
> + 		}
> + }
> > proc.time() - ptm
>    user  system elapsed
>   39.19    0.00   39.20
> 
> 
> The larger issue for me, however, is recovering to the set of data frames
> with the new calculations completed inside each one.  For this I used the
> following syntax that I gleaned from the web:
> 
> data.frame(lapply(data.frame(t(sapply(biglist, `[`))), unlist))
> 
> But this results in
> Error in FUN(X[[2003L]], ...) :
>   promise already under evaluation: recursive default argument reference
> or earlier problems?
> Calls: data.frame -> lapply -> FUN
> Execution halted
> 
> In previous executions I have seen the all to familiar error message
> 'unable to allocate a vector of size...' indicating to me that I have run
> out of usable RAM at this last step.  I have 8G on my machine, so RAM
> constraints are rarely a problem.  This is the main reason that I said
> that I believed that a list approach was not the best for my situation:
> going that route will not result in a finished job.
> 
> I hope that this demonstration answers the questions of the posters who
> posed the question and can potentially serve to provide an example to
> those who, like me recently, are beginning to explore how to execute on
> multiple data frames.  I hope that this outweighs the fact that I have not
> asked a specific question nor provided re-producible code.  Positive
> comments to advance the state of knowledge or improve my knowledge of the
> processes and syntax are invited.  Flaming comments along the lines that I
> should RTFM are strongly discouraged.
> 
> And many thanks to those who have improved my understanding of R through
> this list in the last few years.
> 
> --John J. Sparks, Ph.D.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.