[R] R's documentation

David Winsemius dwinsemius at comcast.net
Sat May 30 16:25:18 CEST 2009

On May 30, 2009, at 2:01 AM, Zeljko Vrba wrote:

> On Fri, May 29, 2009 at 05:20:24PM +0100, Patrick Burns wrote:
>> If you find some documentation that is
>> confusing, then you can write a message
>> about it that states:
> I think that some kind of a glossary would be helpful.  Then I would  
> know
> whether certain words or phrases are R-specific or whether they come  
> from
> statistics, so I'd at least know where should I continue to dig  
> further.
> A text explaining how data frames *are meant to be used* would be  
> helpful.
> The intro to data frames is clear (collection of vectors of same  
> length),
> but it left me clueless about how functions interpret the data  
> inside.  It
> finally clicked for me when I was reading some intro about lattice  
> graphics
> and where I actually had to display the builtin data-set.  Such a  
> basic
> concept should be explained somewhere without the user needing to  
> basically
> reverse-engineer the concept.  In other words, the "Introduction to R"
> should contain something about "long" and "wide" data formats.  Or  
> at least
> links to proper descriptions should be given (plyr, reshape packages).
> Implicit conversions are vague.  If variable x is a factor, what does
> x==8 do?   Convert 8 to string and compare to one of the levels of x?
> Compare as.numeric(x) with 8?  Simple experiment reveals this, but
> help("==") does not shed light on the issue. (".. or other objects
> for which methods have been written.")

Actually the help page also says:
"If the two arguments are atomic vectors of different types, one is  
coerced to the type of the other, the (decreasing) order of precedence  
being character, complex, numeric, integer, logical and raw."

But the recurring puzzle of "what factors really are" is at work here:

 > x <- factor(8)
 > typeof(x)
[1] "integer"
 > as.integer(x)
[1] 1
 > x == 8
[1] TRUE
 > is.numeric(x)

 > typeof(x)
[1] "integer"
 > is.integer(x)

 > mode(x)
[1] "numeric"
 > is.numeric(x)

>  This raises a bunch of questions:
> What kind of objects are there in R?  How do I find object's methods?

Actually objects have classes (which determine what functions can be  
applied), while functions have methods:

> How do I find overload of == that compares factors and integers (or at
> least HELP for a particular overload)?  The help on "==" is precise,  
> but
> utterly useless for somebody who does not already know 1) what ==  
> does,
> and 2) all the other wider concepts mentioned in the help text.
> [And so on.. this was just the example that was lately bothering  
> me.  In
> general, more cross-referencing between documentation topics might  
> be helpful.
> "SEE ALSO" is not sufficient; hyperlinking would be much more  
> effective because
> it hints at whether a topic is documented or not.]
> I'm an experienced developer, yet it took me three months to go over  
> from
> 5-dimensional arrays and fudging with apply() margins to "proper"  
> use of
> data-frames.

I remember similar problems in getting to the point where I could use  
dataframes. In frustration I decided to construct a PowerPoint  
(actually a OO.o Presentation) that assembled the various accessor and  
constructor functions for the components of dataframes.

>  Had I needed somewhat simpler data manipulation or graphics, I
> would have thrown out R out of the window, as I have many times  
> before.
> Things *should not* be that way.  For an example of what I consider  
> to be
> well-structured documentation, please see
> http://doc.qtsoftware.com/4.5/how-to-learn-qt.html
> which made it possible for me to figure out reasonably quickly how  
> to do what I
> needed without the need for internet searches or asking on mailing  
> lists.
> [And so on, and so on.. I can only describe the help text as  
> "opaque".  Reading
> it feels like reading a foreign language that I'm not very  
> proficient in.]

The usual advice is to get a book. In years gone by Venables and  
Ripley's MASS was the standard, but more recently Dalgaard and others  
have offered their best efforts at an "intro to R". My library  
includes MASS ed 2, Sarkar's Lattice, Spector's Data Manipulation with  
R, Harrell's RMS, Wood's "GAMs: An Intro with R".


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

More information about the R-help mailing list