[R] The hidden costs of GPL software?

Marc Schwartz MSchwartz at MedAnalytics.com
Thu Nov 18 17:03:27 CET 2004


On Thu, 2004-11-18 at 03:24 -0800, Michael Grant wrote:
> Hmmmm, interesting thread and minds will not be
> changed but regarding GUIs...I thought S (aka R) was a
> PROGRAMMING LANGUAGE with a statistical and numerical
> slant, and not a statistics application. ;O)  

>From the R web site:

"R is a language and environment for statistical computing and
graphics."


I think that this is a critical point and that there is, to my mind, a
false predicate at play here.

That predicate is that somehow one should be able to rapidly learn R (or
any programming language for that matter) solely via the available
online reference help or via the freely provided documentation (whether
via R Core or via Contributors).

How many people here have learned to use C, FORTRAN, SAS, VBA, Perl or
any other language strictly by using built-in reference help systems. If
any, it will be a very small proportion.

Sure, SAS comes with documentation that can be measured in hernia
inducing tonnage, but at a substantial annual cost, which I have
referenced here and elsewhere previously. R is free.

Is there anyone who has learned to code in C that does not have a copy
of K&R someplace on their shelf, probably along with copies of other
both general and application specific C references published by
Prentice-Hall, Addison-Wesley, McGraw-Hill or Hayden?

It has been years since I actively coded in C, but I have almost 3
shelves filled with C reference books. I have books dating back to the
early 80's for 80x86 Assembly, MS-DOS/BIOS interrupts and Windows API
technical references and other such books that I used to use on a daily
basis in a former life.

For Linux, I have two shelves filled with various O'Reilly and other
references running the gambit from general Linux stuff to Perl,
Procmail, Postfix, Bash, Regex, Emacs, Admin, Firewalls and others.

For R, I have most of a shelf filled with multiple references, including
three of the four editions of MASS (somehow I missed the 2nd edition). I
have a copy of Peter's ISwR (because on occasion I have an acute attack
of cerebral flatulence and have to go back to basics) along with copies
of Pinheiro & Bates, Fox, Maindonald & Braun, Krause & Olson, Everitt &
Rabe-Hesketh and V&R's S Programming. I have copies of the "White Book"
and the "Green Book" and I have copies of Harrell and Therneau &
Grambsch for specific applications of R.

There are a fair number of already published books on R/S with more
coming by Faraway, Heiberger & Holland, Verzani and others including a
new series from Springer.

My point being that the old philosophy of "No Pain, No Gain" is a
component of the learning curve with R. R is not going to be for
everybody. That's why there are other "point and click" statistical
_applications_ like JMP (albeit not cheap). They are relatively easy,
but at the same time, they are self-limiting. No single math/statistical
"product" is going to meet the needs of the entire spectrum of the
potential user space.

As I have mentioned previously, I am a firm believer in Pareto's 80/20
Rule. In this case, you develop a "product" to meet the needs of 80% of
your target user space, because you will go "bankrupt" meeting the needs
of the other 20%. Said differently, meeting the needs of the other 20%
will consume 80% of your development resources, restricting your ability
to meet the needs of the larger audience.

Having spent 12 years previously with a commercial medical software
company, I will also suggest that typically 20% of your user base will
consume 80% of your support resources.

I will also note that having been on both sides of that equation, the
support provided here within this community is superb and has no peer in
the commercial arena.

In R's case, the 80% of the user space has perhaps been extended by the
kind offerings of those who have made specialty packages available via
CRAN, BioC and others.

It takes a certain level of commitment and time with R to become
effective with it.

That commitment includes, in my mind, supplementing the available _free_
documentation that has kindly been provided by R Core and others, with
other available resources. That does not mean that everyone needs to get
on Amazon.com and spend hundreds of $YOUR_MONETARY_UNIT on books. Many
are available via libraries and/or other resources, especially for those
here in academic environments.

This is a community effort folks and not everything is going to be
provided to you free of charge, with that notion being either in actual
financial cost or time.

It appears that, since this is not the first time this subject has come
up, there is strong interest in building a c("new", "different",
"better", ...) documentation/help system for R. That's fine. For those
that have interest in pursuing this, perhaps the time has come for a
group to form a new r-sig-doc list and move forward with the development
of a framework for a new system that can be developed and implemented by
that same group and then provided back to the community. 

Writing technical and user documentation is a specialty skill set unto
itself and perhaps those with the requisite skill sets will contribute
them for the benefit of all.

For those that do not have the skills and/or the time to contribute, I
would urge you to financially contribute to the R Foundation in whatever
way you can afford. Through that mechanism you will support the
community at large and the future development and enhancement of R.

There is no "hidden cost" here and certainly not one that is unique to
GPL software. The cost is self-evident and it is measured in time and 
$YOUR_MONETARY_UNITs. "Time is money" as they say and that is the same
whether you are using GPL software or a commercial proprietary product. 

A key difference here if any, is that none of us have paid anything for
R, where a portion of that "revenue" would go to support a dedicated
documentation team. In this case, it is "If you want it, you will need
to design and build it."

Best regards,

Marc Schwartz




More information about the R-help mailing list