[R] Suggestions ?!?!

Douglas Bates bates at stat.wisc.edu
Tue Feb 24 17:58:43 CET 2004


ivo welch <ivo.welch at yale.edu> writes:

> * the first is for the summary() method for plain data frames.  it
> would seem to me that the number of "NA" observations should be
> printed as an integer, not necessarily in scientific notation.  I have
> also yet to determine when summary() likes to give means and when it
> does not. (maybe it was an older version that sometimes did not give
> means). summary does not seem to have optional parameters to specify
> what statistics I would like. this could be useful, too.

The form of the output from summary depends on the mode or class of
the column.  A numeric column is summarized by a 'five-number' summary
(min, first quartile, median, third quartile, maximum) and the mean.
If there are NA's in the column the number of NA's is reported.  The
reason that it is sometimes reported to several decimal places is
because all the values in that part of the summary are being printed
in the same format.  If the mean requires four decimal places to get
the desired number of significant digits then the number of NA's will
also be given to four decimal places.

A column that is a factor or an ordered factor will be summarized by a
(possibly truncated) frequency table.  Means, medians, etc. are not
meaningful for factors.

> * another small enhancement:  there are four elementary data frame
> operations that bedevil novices, so they really should have named
> function wrappers:
> 
> 
>      delrow( dataframe d, index=45);
>      insrow( dataframe d, (row)vector v);
>      delcol( dataframe d, "name");
>      inscol( dataframe d, (col)vector v);

Three of the "secrets of the S masters" are:
  - indexing is particularly flexible and powerful in S
  - the "%in%" function is versatile and often overlooked
  - you can add a column to a data frame by assigning to that name
so three of these operations can be written as

 d[ -45, ]                     # delrow( dataframe d, index=45)
 d[ , !(names(d) %in% "name")] # delcol( dataframe d, "name")
 d[ , -col]                    # alternative form is you know the column number
 d$newcol = v                  # inscol( dataframe d, (col)vector v)

> Even a simple alias would do (maybe named row.delete, column.delete).
> I looked at my R "bible" (venables&ripley), too, but here too it is
> not as clear as it needs to be.  yes, these operations are
> programmable, but it ain't as obvious as it should be for beginners.
> these are elementary.

P.S. How many other people think that the next edition of MASS should
be renamed "Secrets of The S Masters"?   :-)




More information about the R-help mailing list