Tom Mulholland
tmulholl at bigpond.net.au
Tue Apr 20 13:50:09 CEST 2004
While reading all of these comments I started thinking about how you might
collect the data. As pointed out, in many ways, there are immediate problems
with the validity of what's collected. That is what is the concept that we
are trying to measure.
So, that was always going to make it hard, but assuming that you got past
that point and you could ask a question who would you ask it of. Here in
Western Australia there's a small and mostly unconnected fraternity of
people who use R. In a place where even strangers seem somehow kind of
familiar (you've been sitting on the same bus for the last 20 years and
still don't know who they are), so my gut feeling is that it's not a major
penetration. Trying to do a random sample would be problematic to say the
least. If you started to try and stratify the sample where would you start.
If you go to places where you know R is being used you have problems with
bias. That might rule out universities as a stratified sample because we're
not really clear about what we are asking.
So what other sources are there. Well there's been some comment about the
mailing list and all the problems that might be involved in that and
counting downloads. Guess that's out. Then I thought about the vibrancy of
the R email list. For a topic such as a statistical programming language
it's unusual to see such activity (at least in my experience.)
So my mind turned to trying to measure some smaller but representative
process. In much the same way that criminologists use the homicide rate as
the pointed end of the violence spectrum (let's not go there, find another
list to have that debate)
So I started thinking about what questions you might ask the typical SPSS or
SAS or ... user that would help. That's when it struck me. Lot's of us, R
Users that is, also have the luck (some might say misfortune) to also work
with other languages. Why do we pick R? The one thing that I have noticed
that separates R Users and S Plus users (and probably some of the other
products that I don't use and know about) with the mainstream use of SPSS
and SAS (at least here in WA) is that the mainstream users are not pushing
the envelope.
So the question is not "How many users are there ?", it's "What are people
doing with it ?" I put the formal citation in each publication I produce and
while they are mainly in-house productions (a problem with applied research)
maybe they'll eventually start to get into the citation charts. I hear some
academics love browsing these. (That is aimed at no-one who's on this list)
Tom Mulholland
Tom Mulholland Associates
Footnote: When 5 out of 6 paragraphs start with the word "So" it's time to
get a life. So I'm off to get a life.
I have been trying to determine the size of the R user base, and was
asked to share my findings with this mailing list. Although I still
don't have any definite estimate of this number, I do have some
interesting and indicative information:
1. It appears that there are about 100,000 S-PLUS users.
Rationale: According to Insightful's 2002 Annual Report, over 100,000
people use Insightful software; since license revenues from S-PLUS and
add-on modules accounted for nearly all of their license revenues in
2002, and their other products are much more costly than S-PLUS, it
seems that the great majority of users of Insightful software are S-PLUS
users.
Conclusion: S-PLUS costs $3500 (Windows) or $4500 (Linux/Unix) for an
individual copy; R is free. This suggests that there may be more R
users than S-PLUS users, which suggests > 100,000 R users.
Does anyone has any other information that would give some notion as to
the RELATIVE numbers of R and S-PLUS users?
2. At least one R book has achieved sales of just over 5,000 copies. (I
could not find sales figures for other R books, as it appears that
publishers are closed-mouthed about such figures. And no, I can't
reveal which particular book this was, so don't ask.)
Conclusion: Very few books sell to more than 12% of the population of
potential buyers, and most books have a far lower penetration -- 1% or
less is not uncommon. A 12% penetration for the book in question
implies 42,000 R users; a more reasonable 5% penetration implies 100,000
users. A low 1% penetration implies 500,000 users.
3. There are a total of 3225 unique subscribers to the three R mailing
lists.
