[R] Growth of CRAN?

Spencer Graves spencer.graves at structuremonitoring.com
Mon Apr 14 06:21:29 CEST 2014


(minor correct)


On 4/13/2014 7:41 PM, Gabor Grothendieck wrote:
> On Sun, Apr 13, 2014 at 1:26 PM, John Fox <jfox at mcmaster.ca> wrote:
>> I've attached the most recent data I have, which are from mid-2012. My
>> package counts came from
>> https://svn.r-project.org/R/branches/R-*-branch/tests/internet.Rout.save
>> (where the * is the R version).
>>
>
> It seems that the growth is exponential but at a lower slope (of the
> log curve) after 2008 than before. A linear fit to the log curve is
> shown  in blue before 2008 and in red after 2008.  What happened to
> result in two such distinct regimes?


       I got a great fit using a 4-parameter log-logistic model with 
drm{drc};  see below.  This model suggests that CRAN will approach an 
asymptote of roughly 60,000 packages with a 95% confidence interval 
ranging from 31 to 117 thousand.


       Obviously, the confidence interval for the asymptote assumes the 
4-parameter log-logistic model is accurate.  That's probably not 
realistic but is more accurate than assuming continued exponential 
growth.  If I had time to develop more accurate predictions and 
confidence intervals, I'd try Bayesian Model Averaging with several 
different models.


       Thanks for the question and comments.


       Spencer


# Wait until "Build status: Current" at rev. 178 on Ecdat on R-Forge, then:
install.packages("Ecdat", repos="http://R-Forge.R-project.org")

(day1 <- min(CRANpackages$Date)) # 2001-06-21
str(ddate <- CRANpackages$Date-day1)
CRANpackages$CRANdays <- as.numeric(ddate)

library(drc)
CRANlogLogis4. <- drm(log(Packages)~CRANdays, data=CRANpackages, fct=LL.4())
plot(CRANlogLogis4., log='y') # best I've found so far.

plot(resid(CRANlogLogis4.))
CRANlogLogis4.
# log(Packages) = c + (d-c)/(1 + (t/t0)^b)
# where
# b = -1.36 = log(60152)
# c = 4.73
# d = 11.0
# t0 = 3309 days since 2001-06-21

(ci4 <- confint(CRANlogLogis4.))

        2.5%   97.5%
b   -1.49   -1.24  # power of time = rate at which t^b -> 0
c    4.67    4.80   #
d   10.34   11.67 # asymptote of log(Packages)
t0 2800   3818 # reference number of days

# Asymptotic number of CRAN packages
exp(ci4[3, ])
     2.5 %    97.5 %
c(31, 117)*1000


>
> Lines <- "version date        packages
> 1.3     2001-06-21   110
> 1.4     2001-12-17   129
> 1.5     2002-05-29   162
> #1.6     2002-10-01   163
> 1.7     2003-05-27   219
> 1.8     2003-11-16   273
> 1.9     2004-06-05   357
> 2.0     2004-10-12   406
> 2.1     2005-06-18   548
> 2.2     2005-12-16   647
> 2.3     2006-05-31   739
> 2.4     2006-12-12   911
> 2.5     2007-04-12  1000
> 2.6     2007-11-16  1300
> 2.7     2008-03-18  1427
> 2.8     2008-10-18  1614  # updated
> 2.9     2009-04-17  1952
> 2.10    2009-10-26  2088
> 2.11    2010-04-22  2445
> 2.12    2010-10-15  2837
> 2.13    2011-04-13  3286
> 2.14    2011-06-20  3618
> 2.15    2012-07-07  4000
> "
> library(zoo)
> zz <- read.zoo(text = Lines, header = TRUE, index = 2)[, 2]
> plot(log(zz))
> d <- as.Date("2008-01-01")
> abline(v = d)
> pre <- time(zz) < d
> fo <- log(zz) ~ time(zz)
> abline(lm(fo, subset = pre), col = "blue")
> abline(lm(fo, subset = !pre), col = "red")




More information about the R-help mailing list