[R] Getting your stuff organized in R

Thu Sep 27 11:21:52 CEST 2001

I'm attaching an small text file
on "Getting your stuff organized in R".
(Sorry if sending an attachment is not considered
a correct etiquette in r-help, but this is
only 7911 bytes, plain ascii text and I cannot
post it in a web page at the moment).

Probably all the information in this document is scattered
in one or more
R introduction guides, but I think that it is useful to have
it concentrated under this title. The number of
R objects that are created by the user grows fast
and the way R stores them is kind
of particular (most other packages create a unique disk file
for each object). Therefore, it is important for anyone starting
with R to learn how to organize his/her R objects and avoid
messing up everything into one single, often large .RData file.

I send this document to the list with the hope that people
will correct errors and suggest alternative, better methods.
Please do so directely to alobo at ija.csic.es, not the list.
After your feedback, I'll format it as pdf
or html and send it to the  Contributed
Documentation section of the R-CRAN pages.

Thanks

Agus

Dr. Agustin Lobo
Instituto de Ciencias de la Tierra (CSIC)
Lluis Sole Sabaris s/n
08028 Barcelona SPAIN
tel 34 93409 5410
fax 34 93411 0012
alobo at ija.csic.es

-------------- next part --------------
Getting your stuff organized in R

Probably all this information is scattered in one or more
R introduction guides, but I think that it is useful to have
it concentrated under this title. If after a first contact
with R you have decided to use it, you will want to start working with
your own data as soon as possible. R does not
create a unique disk file for each object, which is the most
comon situation for other packages and probably you are a bit
confused with this. Also, the number of data and function objects
can grow really fast in your R sessions. Therefore, as the number of your
R objects grows and the way R stores them is kind
of idiosyncratic, it is important for you to learn how
to organize your R objects just prior to start working with
your own data.

1. As you know from the R-start.pdf, R keeps everything
in memory. Therefore, it is sage to often type
> save.image()

which will save to disk everything that is listed after
> ls()

into a file named .RData, which is located in the same directory
from whence you started R. Remember that you must use ls -a in order to list
this file along with any other file starting by "." in unix systems.

Therefore, the first "organizing" rule is simply keep your projects into
separate directories and launch R from the appropriate directory.

2. You can save to a different file and/or another directory with:
> save(object1,object2,file="myobjects1&2")

3. It is useful to take advantage of the capabilities of ls() to select
what you want, i.e.:

>ls(pat="liss")
[1] "lissNPC100"      "lissNPC100.ady"  "lissNPC100.stat" "lissNPC1100.ref"

>save(list=ls(pat="liss"),file="lissobjects")

4. You normaly will need functions that are not
in the base package and that are not made available to you after
a default R start. You normally don't want these functions
in your workspace, as they would get saved with save.image()
into .RData and mixed with your objects (which probably also
include "inmature" functions). If you require functions
from a CRAN package, you just use:

>library(Rstreams)

If you type ls() afterwards, you wont see the Rstreams functions. For the shake
of organization, R does not load the package into your workspace, although
the functions are available for you to be used. If you type

>search()

you will get something like:
> search()
[1] ".GlobalEnv"       "package:Rstreams" "package:ctest"    "Autoloads"
[5] "package:base"

which lists your workspace (named ".GlobalEnv"), the package you just attached
(which goes, by default, to position 2), and   "package:ctest", "Autoloads"  and
"package:base", which were automatically attached at starting R.

Now, if you type

>ls(2)

you will get the listing of the Rstreams package.

5. As you develop your project, you transform your original data and often
create new data frames and data matrices. In order to keep the original data
safe, it's a good idea to keep them in a separate file. Another reason
to separate the original data is that they might be large data files, while you
most often work with data that have been selected or sampled from the original
file. As R automatically will load your .RData in memory, it's more efficient
 not to load any large object unless you really need it.
You can save the original file to a different file with:

>save(data1.ori,"data1ori.rda")

and then you can delete the object from your workspace: the next .Rdata file
that you'll make by using
save.image() or by quiting R and saving the workspace, will not include data1.ori.

6. If it happens that you need data1.ori afterwards, you should use

>attach("data1ori.rda")

rather than

>load("data1ori.rda")

Using attach("data1ori.rda"), your object data1.ori will be loaded into
a different environment (pos=2 by default), which implies that you'll be able
to use it but will not be mixed up with your "every day work" when you use
save.image and/or quit R.

You can type

>search()

before and after attach("data1ori.rda") to see the result.

7. As R integrates a large number of statistical methods and graphics
with a high-level language,
your work will imply creating a number of functions of your own. As soon as
your functions attain a certain "maturity" and you consider them of general use
for your own work, you should organize them as packages (see "Creating R packages" in
R-exts.pdf).

8. Meanwhile, it's also a good idea to save your functions into a different file,
or use that file as an intermediate step between the workspace and the library.
A good reason
to separate functions from other objects is that you might want to use a function
that you developped for another project.
Keeping functions and data objects in a different files will let you attach the
functions while avoiding the  data objects. Remember that you do not want to attach anything
that you do not need because it costs you memory.

The following function will let you list only the functions present in a given environment
(your workspace by default):

> lsf
function (pos=1)
{
        a <- b <- ls(pos=pos)
        for (i in 1:length(a)) {
                b[i] <- mode(get(a[i]))
        }
        a[grep("function",b)]
}

> lsf()
 [1] "disc.qda"           "edges"              "ima.explore2"
 [4] "imagen"             "imagenrgb"          "lsf"
 [7] "mat.select"         "no.na.mat"          "no.rep.mat"
[10] "parcelas.lda"       "parcelas.liss.func" "reclas"
[13] "rescale"            "utm2lincol"

You can use lsf() to save your functions to a file:

> save(list=lsf(), file="Rfunctions.rda")

9. Actually, it's more usual to save functions in text format, which you can do with:

> dump(list=lsf(),file="testdump.R")

But you cannot use either load()  or attach() with files created by dump(). Instead,
you must use source()

> source("testdump.R")

but beware that source() will create the functions in your workspace. I've not found
any way to direct source() to another position.

10.Sometime wou will want to add an object from your workspace to an existing R disk file.
For example, you'll want to add a new function developped in your workspace to
the functions file of your project.  You just need the option append in dump() for this
purpose:

> dump("mynewfunc",file="proj1funcs.R",append=T)

It's a bit more complicated to add  a data object to an R binary file, because
there is not an "append" option in save(). But you can use ls() in the following way:

> search()
[1] ".GlobalEnv"    "package:ctest" "Autoloads"     "package:base"
> attach("lissN543cod.R")
> search()
[1] ".GlobalEnv"         "file:lissN543cod.R" "package:ctest"
[4] "Autoloads"          "package:base"
> ls(2)
[1] "lissN543.cod"  "lissN543E.cod" "lissN543W.cod"

Now, assuming we want to add an object "a" to  lissN543cod, we would type:

>save(list=c("a",ls(2)),file="lissN543cod_v2.rda")

Note the "" in the list argument.

Once  lissN543cod_v2 is checked, we can delete lissN543cod.

11. In order to copy an object from the workspace to another environment,
you can use assign():

> search()
[1] ".GlobalEnv"         "file:lissN543cod.R" "package:ctest"
[4] "Autoloads"          "package:base"
> ls(2)
[1] "lissN543.cod"  "lissN543E.cod" "lissN543W.cod"
> assign(get(a),a,pos=2)
> ls(2)
[1] "a"             "lissN543.cod"  "lissN543E.cod" "lissN543W.cod"

You can delete a from the workspace, but beware that in such a case a will not be saved by save.image() or at
quiting R. You would need to use:

> save(list=ls(2),file="newfile.rda")

12. If you have several projects, you might forget what objects were in a given R binary file
created with save(). Unfortunately, I've not found any way to list the contents of such a file
unless it is attached or loaded. Also, selecting objects for loading from a R binary file seems
not possible.

Hope this notes are useful. Please send your comments, corrections etc. to alobo at ija.cisc.es
Note that R is a collaborative project, which also applies for documentation and guides!