[R] Making objects global in a package

Sat Jul 14 02:51:06 CEST 2018

Greetings.  I'm putting together a small package in which I use
`dplyr::read_csv()` to read CSV files from several different sources.  I do
this in several different files, but with various kinds of subsequent
processing, depending on the file.

I find it useful to specify column types, as the apparent data type of a given
column sometimes changes unexpectedly deep into the file.  I.e., a field that
consistently looks like an integer, suddenly becomes a fraction:

    1, 1, ..., 1, 1/2, 1, ...

Hence, the column type has to be treated as a character, rather than as an
integer (with the possibility of later conversion to double, if necessary).
(This is just an example.)

Therefore I use the `col_types` argument in all of the calls to `read_csv()`.

These calls are spread over several files, but I want the keep all of the
column types in a single place, yet have them available in each of the several
files.  This is just for the sake of maintainability.

At the moment I do this by putting the column-type definitions into a single,
file:

    000_define_data_attributes.R

that:

    (1) is named so that it's parsed first by `devtools::build()`
    (2) sets up an environment and stuffs the column types into it:

            data_env <- new.env(parent=emptyenv())
            data_env$col_types_alpha <- list(
                Date = col_date(),
                var1 = col_double(),
                ...
            )

There are a few other things that go into the file as well.

Then I pick off the appropriate stuff from the environment in the other files:

    foo_alpha <- read_csv("alpha.csv", col_types = data_env$col_types_alpha)

This seems to work, but it doesn't "feel" right to me.  (If this were Python,
people would accuse me of being "non-pythonic").

Hence, I'm seeking suggestions for the best practice for this kind of thing.

BTW, I note that both the sources of data ("alpha", etc.) and the column types
are more or less guaranteed to be static for the foreseeable future.  Hence,
there really isn't much danger in just replicating the column-type definitions
in each of the various files, which would obviate the need for the "000..."
file.  In other words, this is mostly a style thing.

Thanks for any advice you can provide.

-- Mike