[R] R in the NY Times

Marc Schwartz marc_schwartz at comcast.net
Wed Jan 7 21:07:51 CET 2009


on 01/07/2009 09:29 AM Max Kuhn wrote:
>> "You can look on the SAS message boards and see there is a proportional downturn in traffic."
> 
> I think that I actually made this statement about both the SAS and
> Splus traffic...
> 
> I wasn't really trying to be critical of SAS. I was trying to get
> across that SAS focused their resources on features that had nothing
> to do with *statistical analysis* (e.g. data warehousing etc.)


Presuming that the Google Groups archive of SAS-L is reasonably complete:

 http://groups.google.com/group/comp.soft-sys.sas/about

The monthly posting frequency data since 1993 is:

Posts <- structure(list(Jan = c(NA, 546L, 548L, 853L, 1007L, 894L, 514L,
1720L, 1826L, 1941L, 1832L, 1636L, 2122L, 2722L, 2750L, 2305L,
357L), Feb = c(NA, 511L, 734L, 1024L, 1150L, 1068L, 493L, 1519L,
1537L, 1845L, 1846L, 1652L, 1960L, 1645L, 926L, 2255L, NA), Mar = c(NA,
658L, 963L, 805L, 1108L, 945L, 659L, 1177L, 1915L, 2010L, 1755L,
2188L, 629L, 1711L, 1728L, 2712L, NA), Apr = c(NA, 681L, 792L,
1052L, 1315L, 784L, 1077L, 1163L, 1467L, 2199L, 1757L, 1826L,
2169L, 2796L, 2766L, 2789L, NA), May = c(NA, 712L, 945L, 1163L,
1212L, 448L, 778L, 1963L, 1735L, 2373L, 1863L, 1836L, 2283L,
3147L, 2974L, 2025L, NA), Jun = c(NA, 751L, 1002L, 999L, 1127L,
813L, 540L, 1615L, 1905L, 2133L, 1701L, 2606L, 2407L, 2723L,
2691L, 2368L, NA), Jul = c(15L, 763L, 775L, 1184L, 1074L, 896L,
476L, 1572L, 2027L, 2445L, 1926L, 1843L, 2061L, 761L, 2435L,
2607L, NA), Aug = c(458L, 975L, 969L, 1053L, 692L, 823L, 612L,
1696L, 1976L, 1492L, 1689L, 2143L, 1793L, 2027L, 2592L, 2584L,
NA), Sep = c(330L, 703L, 745L, 1176L, 947L, 894L, 1351L, 1491L,
1439L, 1864L, 1646L, 1784L, 1365L, 2714L, 1868L, 2554L, NA),
    Oct = c(219L, 805L, 691L, 1197L, 900L, 1129L, 1708L, 1669L,
    1592L, 2133L, 1832L, 1712L, 1427L, 2983L, 2320L, 2434L, NA
    ), Nov = c(472L, 752L, 773L, 911L, 853L, 733L, 1720L, 1490L,
    1636L, 1663L, 1545L, 1786L, 1518L, 2848L, 2112L, 1984L, NA
    ), Dec = c(517L, 666L, 765L, 844L, 677L, 492L, 1595L, 1298L,
    1424L, 1520L, 1445L, 2148L, 1524L, 2374L, 1948L, 1921L, NA
    )), .Names = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = "data.frame",
row.names = c("1993",
"1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001",
"2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009"
))



> Posts
      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
1993   NA   NA   NA   NA   NA   NA   15  458  330  219  472  517
1994  546  511  658  681  712  751  763  975  703  805  752  666
1995  548  734  963  792  945 1002  775  969  745  691  773  765
1996  853 1024  805 1052 1163  999 1184 1053 1176 1197  911  844
1997 1007 1150 1108 1315 1212 1127 1074  692  947  900  853  677
1998  894 1068  945  784  448  813  896  823  894 1129  733  492
1999  514  493  659 1077  778  540  476  612 1351 1708 1720 1595
2000 1720 1519 1177 1163 1963 1615 1572 1696 1491 1669 1490 1298
2001 1826 1537 1915 1467 1735 1905 2027 1976 1439 1592 1636 1424
2002 1941 1845 2010 2199 2373 2133 2445 1492 1864 2133 1663 1520
2003 1832 1846 1755 1757 1863 1701 1926 1689 1646 1832 1545 1445
2004 1636 1652 2188 1826 1836 2606 1843 2143 1784 1712 1786 2148
2005 2122 1960  629 2169 2283 2407 2061 1793 1365 1427 1518 1524
2006 2722 1645 1711 2796 3147 2723  761 2027 2714 2983 2848 2374
2007 2750  926 1728 2766 2974 2691 2435 2592 1868 2320 2112 1948
2008 2305 2255 2712 2789 2025 2368 2607 2584 2554 2434 1984 1921
2009  357   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA


One can then review the annual posting frequency via:

pdf("SAS-L.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),
              beside = TRUE,
              cex.names = 0.6, main = "SAS-L Traffic",
              cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
      line = 2, cex = 0.5)

dev.off()


There would appear to be marked increases in 2000 and again in 2006.
However, it has been flat for the past 3 calendar years. No decline yet,
but it will happen in due course...



No comparable posting data table exists for S-News as far as I can find,
so I wrote a quick program to read the S-News archive pages here:

  http://www.biostat.wustl.edu/archives/html/s-news/

and get monthly posting counts, using the 'Thread' based html pages,
where each monthly embedded post link has a URL of the form:

http://www.biostat.wustl.edu/archives/html/s-news/YYYY-MM/msgXXXXX.html


Thus, the program I used is:

TD <- paste(rep(1998:2009, each = 12), sprintf("%02d", 1:12), sep = "-")
Posts <- numeric(length(TD))

for (i in seq(along = TD))
{
  URL <- paste("http://www.biostat.wustl.edu/archives/html/s-news/",
               TD[i], "/threads.html", sep = "")

  cat(URL, "\n")

  if (!inherits(try(con <- readLines(URL)), "try-error"))
  {
    Posts[i] <- length(grep("msg.*\\.html", con))
    rm(con)
  } else {
    Posts[i] <- NA
  }
}


Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
rownames(Posts) <- 1998:2009
colnames(Posts) <- month.abb

That gives you:

Posts <- structure(c(NA, 210, 264, 246, 230, 189, 197, 174, 109, 51, 48,
5, 273, 173, 313, 232, 255, 179, 230, 161, 87, 59, 63, NA, 378,
313, 285, 252, 242, 218, 257, 193, 99, 74, 58, NA, 293, 300,
264, 300, 228, 196, 151, 182, 123, 48, 47, NA, 330, 334, 306,
331, 219, 189, 164, 174, 107, 46, 31, NA, 243, 254, 247, 282,
248, 217, 175, 109, 96, 34, 27, NA, 219, 284, 245, 258, 230,
221, 154, 159, 84, 47, 40, NA, 209, 270, 302, 260, 207, 187,
187, 144, 97, 39, 28, NA, 191, 300, 204, 260, 221, 186, 195,
107, 68, 35, 41, NA, 241, 253, 251, 229, 280, 295, 150, 98, 73,
70, 30, NA, 181, 300, 261, 232, 228, 197, 176, 82, 53, 56, 27,
NA, 141, 194, 176, 194, 177, 142, 176, 84, 20, 41, 36, NA), .Dim = c(12L,
12L), .Dimnames = list(c("1998", "1999", "2000", "2001", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009"), c("Jan",
"Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")))


> Posts
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1998  NA 273 378 293 330 243 219 209 191 241 181 141
1999 210 173 313 300 334 254 284 270 300 253 300 194
2000 264 313 285 264 306 247 245 302 204 251 261 176
2001 246 232 252 300 331 282 258 260 260 229 232 194
2002 230 255 242 228 219 248 230 207 221 280 228 177
2003 189 179 218 196 189 217 221 187 186 295 197 142
2004 197 230 257 151 164 175 154 187 195 150 176 176
2005 174 161 193 182 174 109 159 144 107  98  82  84
2006 109  87  99 123 107  96  84  97  68  73  53  20
2007  51  59  74  48  46  34  47  39  35  70  56  41
2008  48  63  58  47  31  27  40  28  41  30  27  36
2009   5  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA


Which can then be graphed by:

pdf("S-News.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),
              beside = TRUE,
              cex.names = 0.6, main = "S-News Traffic",
              cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
      line = 2, cex = 0.5)

dev.off()



The consistent decline in posting frequency since 1999 is notable. The
temporal association with the introduction of R is perhaps profound.



As long as I am on the subject, I figured that I would do the same for
R-Help. The downside is that readLines() (really url() ) does not
support https:, so I took a somewhat different approach, using wget:


TD <- paste(rep(1997:2009, each = 12), month.name, sep = "-")
Posts <- numeric(length(TD))

for (i in seq(along = TD))
{
  URL <- paste("https://stat.ethz.ch/pipermail/r-help/",
               TD[i], "/thread.html", sep = "")

  cat(URL, "\n")

  CMD <- paste("wget", URL)
  system(CMD)

  if (file.exists("thread.html"))
  {
    con <- readLines("thread.html")
    Posts[i] <- length(grep("[0-9]+\\.html", con))
    rm(con)
    unlink("thread.html")
  } else {
    Posts[i] <- NA
  }
}

Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
rownames(Posts) <- 1997:2009
colnames(Posts) <- month.abb


This gives you:

Posts <- structure(c(NA, 135, 226, 205, 558, 884, 1017, 1116, 1746,
2075, 1714, 2490, 462, NA, 79, 145, 355, 583, 697, 1137, 1580, 1724,
1920, 1907, 2583, NA, NA, 114, 195, 377, 651, 880, 1203, 1946,
1703, 2270, 2191, 2740, NA, 92, 101, 189, 377, 470, 965, 1488,
1657, 2057, 1818, 2145, 2487, NA, 36, 90, 161, 504, 552, 1057,
1268, 1561, 1887, 2029, 2210, 2517, NA, 47, 105, 186, 418, 550,
926, 1319, 1714, 2056, 1811, 2307, 2774, NA, 41, 110, 184, 293,
615, 918, 1344, 1618, 1872, 1785, 2138, 3268, NA, 37, 64, 148,
356, 562, 824, 1210, 1493, 1777, 1898, 2241, 2813, NA, 40, 94,
203, 434, 678, 705, 1443, 1534, 1709, 1902, 2028, 2990, NA, 76,
96, 231, 418, 657, 1055, 1567, 1712, 1810, 2328, 2708, 3037,
NA, 61, 184, 318, 433, 825, 1038, 1605, 1895, 1907, 2127, 2594,
2730, NA, 57, 105, 221, 422, 530, 742, 1158, 1481, 1508, 1450,
2028, 2399, NA), .Dim = c(13L, 12L), .Dimnames = list(c("1997",
"1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005",
"2006", "2007", "2008", "2009"), c("Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))


> Posts
      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
1997   NA   NA   NA   92   36   47   41   37   40   76   61   57
1998  135   79  114  101   90  105  110   64   94   96  184  105
1999  226  145  195  189  161  186  184  148  203  231  318  221
2000  205  355  377  377  504  418  293  356  434  418  433  422
2001  558  583  651  470  552  550  615  562  678  657  825  530
2002  884  697  880  965 1057  926  918  824  705 1055 1038  742
2003 1017 1137 1203 1488 1268 1319 1344 1210 1443 1567 1605 1158
2004 1116 1580 1946 1657 1561 1714 1618 1493 1534 1712 1895 1481
2005 1746 1724 1703 2057 1887 2056 1872 1777 1709 1810 1907 1508
2006 2075 1920 2270 1818 2029 1811 1785 1898 1902 2328 2127 1450
2007 1714 1907 2191 2145 2210 2307 2138 2241 2028 2708 2594 2028
2008 2490 2583 2740 2487 2517 2774 3268 2813 2990 3037 2730 2399
2009  462   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA


Which again can be graphed as:

pdf("R-Help.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),
              beside = TRUE,
              cex.names = 0.6, main = "R-Help Traffic",
              cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
      line = 2, cex = 0.5)

dev.off()


Now....there's a healthy growth curve....  :-)

Note that the annual traffic volume for 2008 on R-Help exceeds that on
SAS-L.

For convenience, I am attaching each of the 3 plots.

Regards,

Marc Schwartz

-------------- next part --------------
A non-text attachment was scrubbed...
Name: SAS-L.pdf
Type: application/pdf
Size: 4795 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090107/5fd0ece6/attachment-0006.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: S-News.pdf
Type: application/pdf
Size: 4098 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090107/5fd0ece6/attachment-0007.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: R-Help.pdf
Type: application/pdf
Size: 4263 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090107/5fd0ece6/attachment-0008.pdf>


More information about the R-help mailing list