R-alpha: R FAQ
Kurt Hornik
Kurt.Hornik@ci.tuwien.ac.at
Mon, 25 Aug 1997 14:39:45 +0200
Attached is a snapshot of the new version of the FAQ. What is still
missing is something on eval and .Options versus options(). As always,
feedback is greatly appreciated.
Best,
-k
***********************************************************************
R FAQ
Kurt Hornik
v0.2-0, 1997/09/01
This document contains answers to some of the most frequently asked
questions about R. Feedback is welcome.
______________________________________________________________________
Table of Contents:
1. Introduction
1.1 Legalese
1.2 Obtaining this Document
1.3 Notation
1.4 Feedback
2. R Basics
2.1 What Is R?
2.2 What Machines Does R Run on?
2.3 What Is the Current Version of R?
2.4 How Can R Be Obtained?
2.5 How Can R Be Installed?
2.5.1 How Can R Be Installed (Unix)
2.5.2 How Can R Be Installed (Windows)
2.5.3 How Can R Be Installed (Macintosh)
2.6 Are there Unix Binaries for R?
2.7 Which Documentation Exists for R?
2.8 Which Mailing Lists Exist for R?
2.9 What is CRAN?
3. R and S
3.1 What Is S?
3.2 What Is S-PLUS?
3.3 What Are the Differences between R and S?
4. R Add-On Packages
4.1 Which Add-on Packages Exist for R?
4.2 How Can Add-on Packages Be Installed?
4.3 How Can Add-on Packages Be Used?
4.4 How Can I Contribute to R?
5. R and Emacs
5.1 Is there Emacs Support for R?
5.2 Should I Run R from Within Emacs?
6. R Miscellania
6.1 How Can I Read a Large Data Set into R?
6.2 Why Can't R Source a `Correct' File?
6.3 How Can I Set Components of a List to NULL?
6.4 How Can I Save My Workspace?
6.5 How Can I Clean Up My Workspace?
6.6 Why Do My Matrices Lose Dimensions?
7. Acknowledgments
______________________________________________________________________
1. Introduction
This document contains answers to some of the most frequently asked
questions about R.
1.1. Legalese
This document is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
A copy of the GNU General Public License is available via WWW at
http://www.gnu.org/copyleft/gpl.html. You can also obtain it by
writing to the Free Software Foundation, Inc., 675 Mass Ave,
Cambridge, MA 02139, USA.
1.2. Obtaining this Document
The latest version of this document is always available from
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
From there, you can also obtain versions converted to plain ASCII
text, GNU info, DVI, and PostScript, as well as the SGML source used
for creating all these formats using the SGML-Tools (formerly
Linuxdoc-SGML) system.
1.3. Notation
Everything should be pretty standard. `R>' is used for the R prompt,
and a `$' for the shell prompt (where applicable).
1.4. Feedback
Feedback is of course most welcome.
In particular, note that I do not have access to Windows or Mac
systems. If you have information on these systems that you think
should be added to this document, please let me know.
2. R Basics
2.1. What Is R?
R is a system for statistical computation and graphics. It consists
of a language plus a run-time environment with graphics, a debugger,
access to certain system functions, and the ability to run programs
stored in script files.
The design of R has been heavily influenced by two existing languages:
Becker, Chambers & Wilks' S (see question ``What is S?'') and
Sussman's Scheme. Whereas the resulting language is very similar in
appearance to S, the underlying implementation and semantics are
derived from Scheme. See question ``What Are the Differences between
R and S?'' for a discussion of the differences between R and S.
R is being developed by Ross Ihaka and Robert Gentleman, who are
Senior Lecturers at the Department of Statistics of the University of
Auckland in Auckland, New Zealand.
R has a home page at http://stat.auckland.ac.nz/r/r.html, and is free
software distributed under a GNU-style copyleft.
2.2. What Machines Does R Run on?
R is being developed for the Unix, Windows and Mac platforms.
R will configure and build under a number of common Unix platforms
including dec-alpha-osf, freebsd, hpux, linux-elf-ix86, sgi-irix,
solaris, and sunos, and according to Jim Lindsey <jlindsey@luc.ac.be>
also on Mac, Amiga and Atari under linux-m68k.
If you know about other platforms, please drop me a note.
2.3. What Is the Current Version of R?
The current Unix version is 0.50. The previous version was 0.49 and
added group methods and complex numbers, thus providing a more or less
provides a full implementation of S as described in ``The New S
Language''.
The versions for Windows and Mac are pre-alpha. With some good luck,
the Windows version will soon catch up with the Unix version.
2.4. How Can R Be Obtained?
Sources, binaries and documentation for R can be obtained via CRAN,
the ``Comprehensive R Archive Network'' (see question ``What is
CRAN?'').
2.5. How Can R Be Installed?
2.5.1. How Can R Be Installed (Unix)
If binaries are available for your platform (see question ``Are there
Unix Binaries for R?''), you can use these, following the instructions
that come with them.
Otherwise, you can compile and install R yourself, which can be done
very easily under a number of common Unix platforms (see question
``What Machines Does R Run on?''). The file INSTALL that comes with
the R distribution contains instructions.
Choose a place to install the R tree (R is not just a binary, but has
additional data sets, help files, font metrics etc). Let's call this
place RHOME (given appropriate permissions, a natural choice would be
`/usr/local/lib/R'). Untar the source code, and issue the following
commands (at the shell prompt):
$ ./configure
$ make
$ make install-help
You can also build a LaTeX version of the manual entries with
$ make install-latex
and an HTML version of the manual with
$ make install-html
If these commands execute successfully, the R binary will be copied to
the `$RHOME/bin' directory. In addition, a shell script font-end
called `R' will be created and copied to the same directory. You can
copy this script to a place where users can invoke it, for example to
`/usr/local/bin'. You could also copy the man page `R.1' to a place
where your man reader finds it, such as `/usr/local/man/man1'.
2.5.2. How Can R Be Installed (Windows)
The file `RApril.zip' from the `bin/ms-windows' directory of a CRAN
site contains a binary Windows 95 distribution for R which should be
about a 0.48 release. This version is quite limited in Windows-
specific features and probably contains many bugs, although it has
been reported to work rather nicely. As always, there is no limit to
the damage a buggy program can do under Windows, so, at least until
more experience have been gained with it, it should probably not be
used on systems containing very important data. It is not clear
whether it works under Windows 3.11 (the HTML help system cannot work
under 3.11 as it needs long file names). According to James Neild
<jneild@esri.com>, it works fine under Windows NT 3.51.
Note that when uncompressing `RApril.zip', the pkunzip program needs
to be invoked with the -D flag to create subdirectories.
2.5.3. How Can R Be Installed (Macintosh)
The CRAN `bin/macintosh' directory contains `R.sea.hqx', a binhexed
self-extracting archive, and installation instructions in
`README.MACINTOSH'. Note that the version in it is nowhere near the
quality of the current Unix version.
The Power Macintosh port is temporarily on hold.
2.6. Are there Unix Binaries for R?
Experimental `.deb' and `.rpm' packages for installation under the
ix86 versions of Debian GNU/Linux and Red Hat Linux, respectively, can
be found in `bin/ix86-linux'. No other binary distributions for Unix
systems have thus far been made publically available.
2.7. Which Documentation Exists for R?
Online documentation for most of the functions and variables in R
exists, and can be printed on-screen by typing help(name) (or ?name)
at the R prompt, where name is the name of the R object help is sought
for. (In the case of unary and binary operators and control-flow
special forms, the name may need to be be quoted.)
This documentation can also be made available as HTML, and as hardcopy
via LaTeX, see question ``How Can R Be Installed?''. An up-to-date
HTML version is always available for web browsing at
http://www.stat.math.ethz.ch/R-manual
An R manual (``Notes on R: A Programming Environment for Data
Analysis and Graphics'') is currently being written, based on the
``Notes on S-PLUS'' by Bill Venables <venables@stats.adelaide.edu.au>
and David Smith <D.M.Smith@lancaster.ac.uk>. The current version can
be obtained as `Rnotes.tgz' (LaTeX source) in a CRAN `doc' directory.
Note that the ``conversion'' from S(-PLUS) to R is not complete yet.
Last, but not least, Ross' and Robert's experience in designing and
implementing R is described in:
@Article{,
author = {Ross Ihaka and Robert Gentleman},
title = {R: A Language for Data Analysis and Graphics},
journal = {Journal of Computational and Graphical Statistics},
year = 1996,
volume = 5,
number = 3,
pages = {299--314}
}
This is also the reference for R to use in publications.
2.8. Which Mailing Lists Exist for R?
Thanks to Martin Maechler <maechler@stat.math.ethz.ch>, there are
three mailing lists devoted to R.
r-announce
This list is for announcements about the development of R and
the availability of new code.
r-devel
This list is for discussions about the future of R and pre-
testing of new versions. It is meant for those who maintain an
active position in the development of R.
r-help
The `main' R mailing list, for announcements about the
development of R and the availability of new code, questions and
answers about problems and solutions using R, enhancements and
patches to the source code and documentation of R, comparison
and compatibility with S and S-plus, and for the posting of nice
examples and, benchmarks.
To send a message to everyone on the r-help mailing list, send email
to
r-help@stat.math.ethz.ch
To subscribe (or unsubscribe) to this list send subscribe (or unsub-
scribe) in the BODY of the message (not in the subject!) to r-help-
request@stat.math.ethz.ch. Information about the list can be obtained
by sending an email with info as its contens to r-help-
request@stat.math.ethz.ch.
Subscription and posting to the other lists is done analogously, with
`r-help' replaced by `r-announce' and `r-devel', respectively. Note
that the r-announce list is gatewayed into r-help, so you don't need
to subscribe to both of them.
It is recommended that you send mail to r-help rather than only to the
R developers (who are also subscribed to the list, of course). This
may save them precious time they can use for constantly improving R,
and will typically also result in much quicker feedback for yourself.
Of course, in the case of bug reports it would be very helpful to have
code which reliably reproduces the problem.
Archives of the above three mailing lists are made available on the
net in a monthly schedule at ftp://ftp.stat.math.ethz.ch/Mail-
archives/ (which is a directory of mail archive files). Archives of
the r-help mailing list (including the previous r-testers lists back
to March 1996), are also available in HTML format at
http://www.ens.gu.edu.au/robertk/rhelp/about.htm.
The WWW page http://www.maths.uq.edu.au/~gks/r/mail.html is devoted to
the R mailing lists, providing easy posting, subscription and
unsubscription, and access to mailing list archives.
The developers of R can be reached for comments and reports at
R@stat.auckland.ac.nz.
2.9. What is CRAN?
The ``Comprehensive R Archive Network'' (CRAN) is a collection of
sites which carry identical material, consisting of the R
distribution(s), the contributed extensions, documentation for R, and
binaries.
The CRAN master site can be found at the URL
ftp://ftp.ci.tuwien.ac.at/pub/R/
and is currently being mirrored daily at
http://lib.stat.cmu.edu/R/CRAN/
ftp://franz.stat.wisc.edu/pub/R/
ftp://ftp.stat.math.ethz.ch/R-CRAN/
http://www.stat.unipg.it/pub/stat/statlib/R/CRAN/
ftp://ftp.u-aizu.ac.jp/pub/lang/R/CRAN/
Please use the CRAN site closest to you to reduce network load.
The structure of the CRAN tree is as follows.
`src/base'
contains the official R distribution as provided by Ross Ihaka
and Robert Gentleman.
`src/contrib'
contains code for extension packages.
`doc'
is for additional documentation and information on R.
`bin'
is for prebuilt R binaries (the base distribution and
extensions), grouped according to platforms. Currently, there
are only experimental packages for Debian GNU/Linux. I hope
that `.tar.gz' files with contents relative to an installation
tree (e.g. `bin', `lib/R/', and `man/man1/R.1') can be made
available soon for all major supported Unix platforms.
To ``submit'' to CRAN, simply upload to
ftp://ftp.ci.tuwien.ac.at/incoming and send email to Kurt Hornik
<Kurt.Hornik@ci.tuwien.ac.at>. Please indicate the copyright
situation (GPL, ...) in your submission.
3. R and S
3.1. What Is S?
S is a very high level language and an environment for data analysis
and graphics. S was written by Richard A. Becker, John M. Chambers,
and Allan R. Wilks of AT&T Bell Laboratories Statistics Research
Department.
The primary references for S are two books by the creators of S.
o Richard A. Becker, John M. Chambers and Allan R. Wilks (1988),
``The New S Language,'' Chapman & Hall, London.
This book is often called the ``Blue Book''.
o John M. Chambers and Trevor J. Hastie (1992), ``Statistical Models
in S,'' Chapman & Hall, London.
This is also called the ``White Book''.
There is a huge amount of user-contributed code for S, available at
the S Repository at CMU.
See the ``Frequently Asked Questions about S''
(http://lib.stat.cmu.edu/S/faq) for further information about S.
3.2. What Is S-PLUS?
S-PLUS is a value-added version of S sold by Statistical Sciences,
Inc. (now a division of Mathsoft, Inc.) S is a subset of S-PLUS, and
hence anything which may be done in S may be done in S-PLUS. In
addition S-PLUS has extended functionality in a wide variety areas,
including robust regression, modern nonparametric regression, time
series, survival analysis, multivariate analysis, classical
statistical tests, quality control, and graphics drivers. Add-on
modules add additional capabilities for wavelet analysis, spatial
statistics, and design of experiments.
See the MathSoft S-PLUS page (http://www.mathsoft.com/splus.html) for
further information.
3.3. What Are the Differences between R and S?
Whereas the developers of R have tried to stick to the S language as
defined in ``The New S Language'' (Blue Book, see question ``What is
S?''), they have adopted the evaluation model of Scheme.
This difference becomes manifest when free variables occur in a
function. Free variables are those which are neither formal
parameters (occurring in the argument list of the function) nor local
variables (created by assigning to them in the body of the function).
Whereas S (like C) by default uses static scoping, R (like Scheme) has
adopted lexical scoping. This means the values of free variables are
determined by a set of global variables in S, but in R by the bindings
that were in effect at the time the function was created.
Consider the following function:
cube <- function(n) {
sq <- function() n * n
n * sq()
}
Under S, sq() does not ``know'' about the variable n unless it is
defined globally:
S> cube(2)
Error in sq(): Object "n" not found
Dumped
S> n <- 3
S> cube(2)
[1] 18
In R, the ``environment'' created when cube() was invoked is also
looked in:
R> cube(2)
[1] 8
The following more `realistic' example illustrating the differences in
scoping is due to Thomas Lumley <thomas@biostat.washington.edu>. The
function
jackknife.lm <- function(lmobj) {
n <- length(resid(lmobj))
jval <- t(apply(as.matrix(1:n), 1,
function(i) coef(update(lmobj, subset = -i))))
(n - 1) * (n - 1) * var(jval) / n
}
does something useful in R, but does not work in S. In order to make
it work in S you need to explicitly pass the linear model object into
the function nested in apply(). If you don't and you are lucky you
will get ``Error: Object "lmobj" not found''. If you are unlucky
enough to have a linear model called lmobj in your global environment
you will get the wrong answer with no warning.
The following version works in S.
jackknife.S.lm <- function(lmobj) {
n <- length(resid(lmobj))
jval <- t(apply(as.matrix(1:n), 1,
function(i, lmobj) coef(update(lmobj, subset = -i)),
lmobj = lmobj))
(n - 1) * (n - 1) * var(jval) / n
}
(The S version was written independently by Thomas and at least three
of his fellow students over the past couple of years, causing liter-
ally hours of confusion on each occasion.)
Similarly, most optimization (or zero-finding) routines need some
arguments to be optimized over and have other parameters that depend
on the data but are fixed with respect to optimization. With R
scoping rules, this is a trivial problem; simply make up the function
with the required definitions in the same environment and scoping
takes care of it. With S, one solution is to add an extra parameter
to the function and to the optimizer to pass in these extras, which
however can only work if the optimizer supports this (and typically,
the builtin ones do not).
Lexical scoping allows using function closures and maintaining local
state. A simple example (taken from Abelson and Sussman) can be found
in the `demos/language' subdirectory of the R distribution. Further
information is provided in the standard R reference ``R: A Language
for Data Analysis and Graphics'' (see question ``Which Documentation
Exists for R?'') and a paper on ``Lexical Scope and Statistical
Computing'' by Robert Gentleman and Ross Ihaka which can be obtained
from the `doc/misc' directory of a CRAN site.
Lexical scoping also implies a further major difference. Whereas S
stores all objects as separate files in a directory somewhere (usually
`.Data' under the current directory), R does not. All objects in R
are stored internally. When R is started up it grabs a very large
piece of memory and uses it to store the objects. R performs its own
memory management of this piece of memory. Having everything in
memory is necessary because it is not really possible to externally
maintain all relevant ``environments'' of symbol/value pairs. This
difference also seems to make R much faster than S.
The down side is that if R crashes you will lose all the work for the
current session. Saving and restoring the memory ``images'' (the
functions and data stored in R's internal memory at any time) can be a
bit slow, especially if they are big. In S this does not happen,
because everything is saved in disk files and if you crash nothing is
likely to happen to them. R is still in an alpha stage, and does
crash from time to time. Hence, for important work you should
consider saving often, see question ``How Can I Save My Workspace?''
(other possibilities are logging your sessions, or have your R
commands stored in text files which can be read in using source()).
(Note that if you run R from within Emacs (see question ``R and
Emacs''), you can save the contents of the interaction buffer to a
file and conveniently manipulate it using S-transcript-mode, as well
as save source copies of all functions and data used.)
Apart from lexical scoping and its implications, R follows the S
language definition in the Blue Book as much as possible, and hence
really is an ``implementation'' of S. There are some intentional
differences where the behavior of S is considered ``not clean''. In
general, the rationale is that R should help you detect programming
errors, while at the same time being as compatible as possible with S.
Some known differences are the following.
o In R, if x is a list, then x[sub] <- NULL and x[[sub]] <- NULL
remove the specified elements from x. The first of these is
incompatible with S, where it is a no-op.
o In S, the functions named .First and .Last in the `.Data' directory
can be used for customizing, as they are executed at the very
beginning and end of a session, respectively. R looks for files
called `.Rprofile' in the user's home directory and the current
directory, and sources these. (It also loads a saved image from
`.RData' in case there is one.) If a .First function exists then,
it is executed. The .Last mechanism is not supported yet.
o In S, library(name) adds the data directory for the library section
name to the search list. If a function object named `.First.lib'
exists in the directory, it is executed; this is typically used to
dynamically load compiled code required by the functions in the
section. In R, library(name) currently simply sources the file
$RHOME/library/name, and compiled code can be loaded by calling
library.dynam() in this file. The .First.lib mechanism is not
really supported.
o In R, dyn.load() can only load shared libraries, as created for
example by `R SHLIB'.
o R presently does not support IEEE Inf and NaN.
o In R, attach currently only works for lists and data frames (not
for directories).
o Categories do not exist in R, and never will as they are deprecated
now in S. Use factors instead.
o In R, For() loops are not necessary and hence not supported.
o In R, assign() uses the argument envir= rather than where= as in S.
o The random number generators are different, and the seeds have
different length.
o The glm family objects are implemented differently in R and S. The
same functionality is available but the components have different
names.
o terms objects are stored differently. In S a terms object is an
expression with attributes, in R it is a formula with attributes.
The attributes have the same names but are mostly stored
differently. The major difference in functionality is that a terms
object is subscriptable in S but not in R. If you can't imagine
why this would matter then you don't need to know.
Also, attr(terms(y~x),"response") give 1 in S and TRUE in R. In S
the attribute indicates which column of the model frame will
contain the response. In R this always column 1 because model
frames are only useful when their columns are in the right order
(model.matrix doesn't check).
There are also differences which are not intentional, and result from
missing or incorrect code in R. The developers would appreciate
hearing about any deficiencies you may find (in a written report fully
documenting the difference as you see it). Of course, it would be
useful if you were to implement the change yourself and make sure it
works.
4. R Add-On Packages
4.1. Which Add-on Packages Exist for R?
The R distribution comes with the following extra packages:
eda
Exploratory Data Analysis. Currently only contains functions
for robust line fitting, and median polish and smoothing.
mva
Multivariate Analysis. Currently contains code for principal
components (prcomp), canonical correlations (cancor),
hierarchichal clustering (hclust), and metric multidimensional
scaling (cmdscale). More functions for clustering and scaling,
biplots, profile and star plots, and code for ``real''
discriminant analysis will be added soon.
The following packages are available from the CRAN `src/contrib' area.
acepack
ace (Alternating Conditional Expectations) and avas (Additivity
and VAriance Stabilization for regression) for selecting
regression transformations.
bootstrap
Software (bootstrap, cross-validation, jackknife), data and
errata for the book ``An Introduction to the Bootstrap'' by B.
Efron and R. Tibshirani, 1993, Chapman and Hall.
class
Functions for classification (k-nearest neighbor and LVQ).
clus
Functions for cluster analysis.
ctest
A collection of classical tests, including the Bartlett, Fisher,
Kruskal-Wallis, Kolmogorov-Smirnov, and Wilcoxon tests.
date
Functions for dealing with dates. The most useful of them
accepts a vector of input dates in any of the forms 8/30/53,
30Aug53, 30 August 1953, ..., August 30 53, or any mixture of
these.
e1071
Miscellaneous functions used at the Department of Statistics at
TU Wien (E1071).
fracdiff
Maximum likelihood estimation of the parameters of a
fractionally differenced ARIMA(p,d,q) model (Haslett and
Raftery, Applied Statistics, 1989).
gee
An implementation of the Liang/Zeger generalized estimating
equation approach to GLMs for dependent data.
integrate
Code for adaptive quadrature.
jpn
A function to plot Japan's coast-line and prefecture boundaries.
leaps
A package which performs an exhaustive search for the best
subsets of a given set of potential regressors, using a branch-
and-bound algorithm, and also performs searches using a number
of less time-consuming techniques.
mlbench
A collection of artificial and real-world machine learning
benchmark problems, including the Boston housing data.
oz Functions for plotting Australia's coastline and state
boundaries.
polynom
A collection of functions to implement a class for univariate
polynomial manipulations.
snns
An R interface to the Stuttgart Neural Networks Simulator
(SNNS).
splines
Regression spline functions.
survival4
Functions for survival analysis (requires splines).
wavethresh
Code for doing wavelet transforms and thresholding in 1 and 2D.
xgobi
Interface to the XGobi program for graphical data analysis.
See CRAN `src/contrib/INDEX' for more information.
Paul Gilbert <pgilbert@bank-banque-canada.ca> has a written a
multivariate time series package for S called time.series that is
mostly converted to run in R. He is currently debugging the code, and
will officially release it in the near future.
According to Paul, the PADI interface from the Bank of Canada also
works with minor changes. PADI can be used to access Fame time series
data bases and potentially other databases, even remotely over the
Internet. For further information see http://www.bank-banque-
canada.ca/pgilbert.
Harald Fekjaer <hfe@math.uio.no> has written addreg, a package for
additive hazards regression, which can be obtained from
http://www.med.uio.no/imb/stat/addreg/.
More code has been posted to the r-help mailing list, and can be
obtained from the mailing list archive.
4.2. How Can Add-on Packages Be Installed?
(Unix only.) Untar the add-on packages in $RHOME/src/library/ and
type
$ make libs
$ cd ../..
$ ./etc/install-libhelp
at the shell prompt.
4.3. How Can Add-on Packages Be Used?
To find out which add-ons have already been installed, type
R> library()
at the R prompt. This produces something like
NAME DESCRIPTION
acepack ace() and avas() for selecting regression transformations
bootstrap Functions for the book "An Introduction to the Bootstrap"
ctest Classical Tests
date Functions for handling dates
eda Exploratory Data Analysis
fracdiff Fractionally differenced ARIMA (p,d,q) models
gee Generalized Estimating Equation models
mva Classical Multivariate Analysis
splines Regression spline functions
survival4 Survival analysis [needs library(splines)]
You can ``load'' the installed package name by
R> library(name)
You can then find out which functions it provides by typing
R> help(library = name)
4.4. How Can I Contribute to R?
R is currently still in alpha (or pre-alpha) state, so simply using it
and communicating problems is certainly of great value.
One place where functionality is still missing is the modeling
software as described in ``Statistical Models in S'' (see question
``What is S?''. The functions
add1 kappa alias labels drop1 proj
are missing; many of these are interpreted functions so anyone that is
bored and wants to have a go at implementing them it would be appreci-
ated. In addition, only linear and generalized linear models are cur-
rently available, aov, gam, loess, tree, and the nonlinear modelling
code are not there yet.
See also the `PROJECTS' file in the top level R source directory.
Many of the packages available at the Statlib S Repository might be
worth porting to R.
If you are interested in working on any of these projects, please
notify Kurt Hornik.
5. R and Emacs
5.1. Is there Emacs Support for R?
There is an Emacs-Lisp interface to S/S-PLUS called S-mode. Its
current version is 4.8 and can be obtained at
http://www.maths.lancs.ac.uk:2080/~maa036/elisp/S-mode/. The earlier
versions which can be found at the Statlib S repository (gnuemacs3 and
gnuemacs4) are outdated.
It contains code for interacting with an inferior S process from
within Emacs including an interface to the help system, editing S
source code, and transcript manipulation, and comes with detailed
instructions for installation.
Martin Maechler <maechler@stat.math.ethz.ch> and Tony Rossini
<rossini@math.sc.edu> have integrated support for R into this package.
The current version is at
ftp://ftp.math.sc.edu/rossini/S-mode-4.8.MM6.XE2.tar.gz
and runs under both GNU Emacs and XEmacs.
To install, put the byte-compiled `.el' files into a place where Emacs
can find them, and add
(if (not (assoc "\\.R$" auto-mode-alist)
(add-to-list 'auto-mode-alist (cons "\\.R$" 'R-mode))))
(autoload 'R "S" "Run an inferior R process" t)
(autoload 'R-mode "S" "Mode for editing R source" t)
(autoload 'r-mode "S" "Mode for editing R source" t)
to one of your Emacs startup files, typically `~/.emacs'. You can
then fire up R from within Emacs by typing `M-x R' (note however that
many interface functions will not work), and if you use the extension
`.R' for your files with R code, Emacs will automagically turn on R
edit mode whenever you visit such a file.
Tony Rossini, Martin Maechler and Kurt Hornik have officially taken
over the development of S-mode. A new version, renamed ``ESS'' (for
``Emacs Speaks Statistics'') and including support for R, all S
versions, and even Xlisp-STAT and Vista, will be released shortly.
5.2. Should I Run R from Within Emacs?
Yes. Inferior R mode provides a readline/history mechanism, object
name completion, and syntax-based highlighting of the interaction
buffer using Font Lock mode, as well as a very convenient interface to
the R help system.
Of course, it also integrates nicely with the mechanisms for editing R
source using Emacs. One can write code in one Emacs buffer and send
whole or parts of it for execution to R; this is helpful for both data
analysis and programming. One can also seamlessly integrate with a
revision control system, in order to maintain a log of changes in your
programs and data, as well as to allow for the retrieval of past
versions of the code.
In addition, it allows you to keep a record of your session, which can
also be used for error recovery through the use of the transcript
mode.
6. R Miscellania
6.1. How Can I Read a Large Data Set into R?
R (currently) uses a static memory model. This means that when it
starts up, it asks the operating system to reserve a fixed amount of
memory for it. The size of this chunk cannot be changed subsequently.
Hence, it can happen that not enough memory was allocated.
In these cases, you should restart R with more memory available, using
the command line options -n and -v. To understand these options, one
needs to know that R maintains separate areas for fixed and variable
sized objects. The first of these is allocated as an array of ``cons
cells'' (Lisp programmers will know what they are, others may think of
them as the building blocks of the language itself, parse trees,
etc.), and the second are thrown on a ``heap''. The -n option can be
used to specify the number of cons cells (each occupying 16 bytes)
which R is to use (the default is 200000), and the -v option to
specify the size of the vector heap in megabytes (the default is 2).
Only integers are allowed for both options.
E.g., to read in a table of 5000 observations on 40 numeric variables,
R -v 6 should do.
Note that the information on where to find vectors and strings on the
heap is stored using cons cells. Thus, it may also be necessary to
allocate more space for cons cells in order to perform computations
with very ``large'' variable-size objects.
You can find out the current memory comsumption by typing gc() at the
R prompt.
6.2. Why Can't R Source a `Correct' File?
R sometimes has problems parsing a file which does not end in a
newline. This can happen for example when Emacs is used for editing
the file and next-line-add-newlines is set to nil. To avoid the
problem, either set require-final-newline to a non-nil value in one of
your Emacs startup files, or make sure R-mode (see question ``Is there
Emacs Support for R?'') is used for editing R source files (which
locally ensures this setting).
Earlier R versions had a similar problem when reading in data files,
but this should have been taken care of now.
6.3. How Can I Set Components of a List to NULL?
You can use
x[i] <- list(NULL)
to set component i of the list x to NULL, similarly for named compo-
nents. Do not set x[i] or x[[i]] to NULL, because this will remove
the corresponding component from the list.
For dropping the row names of a matrix x, it may be easier to use
rownames(x) <- NULL, similarly for column names.
6.4. How Can I Save My Workspace?
The expression
save(list = ls(), file = ".RData")
saves the objects in the currently active environment (typically the
user's .GlobalEnv) to the file `.RData' in the R startup directory.
6.5. How Can I Clean Up My Workspace?
To remove all objects in the currently active environment (typically
the user's .GlobalEnv), you can do
rm(list = ls())
6.6. Why Do My Matrices Lose Dimensions?
When a matrix with a single row or column is created by a subscripting
operation, e.g., row <- mat[2, ], it is by default turned into a
vector. In a similar way if an array with dimension, say, 2x3x1x4 is
created by subscripting it will be coerced into a 2x3x4 array, losing
the unnecessary dimension. After much discussion this has been
determined to be a feature.
To prevent this happening, add the option `drop = FALSE' to the
subscripting. For example,
rowmatrix <- mat[2, , drop = F] # creates a row matrix
colmatrix <- mat[, 2, drop = F] # creates a column matrix
a <- b[1, 1, 1, drop = F] # creates a 1x1x1 array
The `drop = F' option should be used defensively when programming.
For example, the statement
somerows <- mat[index, ]
will return a vector rather than a matrix if index happens to have
length 1, causing errors later in the code. It should probably be
rewritten as
somerows <- mat[index, , drop = F]
7. Acknowledgments
Of course, many many thanks to Robert and Ross for the R system, and
to the package writers and porters for adding to it.
Special thanks go to Peter Dalgaard, Paul Gilbert, Jim Lindsey, Thomas
Lumley, Martin Maechler, and Anthony Rossini for their comments which
helped me improve this FAQ.
More to some soon ...
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-