[Rd] Is it a good idea or even possible to redefine attach?

Winston Chang winstonchang1 at gmail.com
Wed Aug 6 02:54:38 CEST 2014


On Tue, Aug 5, 2014 at 4:37 PM, Grant Rettke <gcr at wisdomandwonder.com> wrote:
>
> That is delightful.
>
> When I run it like this:
> • Start R
> • Nothing in .Rprofile
> • Paste in your code
> ╭────
> │ gcrenv <- new.env()
> │ gcrenv$attach.old <- attach
> │ gcrenv$attach <- function(...){stop("NEVER USE ATTACH")}
> │ base::attach(gcrenv, name="gcr", warn.conflicts = FALSE)
> ╰────
> • I get exactly what is expected, I think
> ╭────
> │ search()
> ╰────
> ╭────
> │  [1] ".GlobalEnv"        "gcr"               "ESSR"
> │  [4] "package:stats"     "package:graphics"  "package:grDevices"
> │  [7] "package:utils"     "package:datasets"  "package:methods"
> │ [10] "Autoloads"         "package:base"
> ╰────
>
> Just to be sure:
> • Is that what is expected?
> • I am surprised because I thought that `gcr' would come first before
>   `.GlobalEnv'
>   • Perhaps I mis understand, as `.GlobalEnv' is actually the "REPL"?
>
> My goal is to move that to my .Rprofile so that it is "always run" and I
> can forget about it more or less.
>
> Reading [this] I felt like `.First' would be the right place to put it,
> but then read further to find that packages are only loaded /after/
> `.First' has completed.  Curious, I tried it just to be sure. I am now
> :).
>
> This is the .Rprofile file:
>
> ╭────
> │ cat(".Rprofile: Setting CMU repository\n")
> │ r = getOption("repos")
> │ r["CRAN"] = "http://lib.stat.cmu.edu/R/CRAN/"
> │ options(repos = r)
> │ rm(r)
>> │ .First <- function() {
> │    «same code as above»
> │ }
> ╰────
>
> (I included the repository load, and understand it should not impact
> things here)
>
> This is run after normal startup of R:
>
> ╭────
> │ search()
> ╰────
> ╭────
> │  [1] ".GlobalEnv"        "package:stats"     "package:graphics"
> │  [4] "package:grDevices" "package:utils"     "package:datasets"
> │  [7] "gcr"               "package:methods"   "Autoloads"
> │ [10] "package:base"
> ╰────
>
> When I read this, I read it as:
> • My rebind of `attach' occurs
> • Then all of the packages are loaded and they are referring to
>   my-rebound `attach'
> • That is a problem because it *will* break package code
> • Clearly me putting that code in `.Rprofile' is the wrong place.
>

That order for search path should actually be fine. To understand why,
you first have to know the difference between the _binding_
environment for an object, and the _enclosing_ environment for a
function.

The binding environment is where you can find an object. For example,
in the global env, you have a bunch bindings (we often call them
variables), that point to various objects - vectors, data frames,
other environments, etc.

The enclosing environment for a function is where the function "runs
in" when it's called.

Most R objects have just a binding environment (a variable or
reference that points to the object); functions also have an enclosing
environment. These two environments aren't necessarily the same.

When you run search(), it shows the set of environments where R will
look for an object of a given name, when you run stuff at the console
(and are in the global env). The trick is that, although you can find
a function (they are bound bound) in one of these _package_
environments, those functions run in (are enclosed by) a different
environment: the a corresponding _namespace_ environment.

The way that a namespace environment is set up with the arrangement of
its ancestor environments, it will find the base namespace version of
`attach` before it finds yours, even if your personal gcr environment
comes early in the search path.

=========================
# Here's an example to illustrate. The `utils::alarm` function calls
`cat`, which is in base.

alarm
# function ()
# {
#     cat("\a")
#     flush.console()
# }
# <environment: namespace:utils>


# Running it makes the screen flash or beep
alarm()
# [screen flashes]


# We'll put a replacement version of cat early in the search path,
between utils and base
my_stuff <- new.env()
my_stuff$cat <- function(...) stop("Tried to call cat")
base::attach(my_stuff, pos=length(search()) - 1, name="my_stuff")

search()
#  [1] ".GlobalEnv"        "tools:rstudio"     "package:stats"
"package:graphics"
#  [5] "package:grDevices" "package:utils"     "package:datasets"
"package:methods"
#  [9] "my_stuff"          "Autoloads"         "package:base"

# Calling cat from the console gives the error, as expected
cat()
# Error in cat() : Tried to call cat

# But when we run alarm(), it still gets the real version of `cat()`,
# because it finds the the original base namespace version of cat
# before it finds yours.
alarm()
# [screen flashes]

==========================

You can even alter package environments without affecting the
corresponding namespace environment. The exception to the package and
namespace environments being distinct is the base environment; change
one and you change the other. (I just realized this and have to
retract my earlier statement about the behavior being different if
change attach in the base package env vs. the base namespace env.)

-Winston



More information about the R-devel mailing list