[R] working with environments to ensure code quality for long R scripts

Bert Gunter gunter.berton at gene.com
Thu Apr 19 22:58:56 CEST 2012


Comment (caveat emptor):

If I understand correctly, your difficulties all stem from your use of
the word "script," which betrays a fundamental misunderstanding of the
nature of R as a programming language.

R is based (mostly) on the concepts of functional programming. So
instead of doing what as you have done -- spreading mishmosh of code
around in a bunch of files --  all that code should be made into
functions. These then can be organized formally into a package (with
documentation, namespaces, etc.) or informally into a saved .Rdata
file. In either case, the whole thing then functions as an organic
whole.

By doing as you have done (are you a former SAS or maybe JAVA
programmer?) you are contravening the programming strategy around
which R is built, leaving you with a clumsy mess. This is not to say
that you can't get it to work -- you probably can. Only that it's a
mess.

Moral: Use R as it is intended to be used, not as you would like it to be used.

Cheers,
Bert

On Thu, Apr 19, 2012 at 12:57 PM, Alexander <juschitz_alexander at yahoo.de> wrote:
> Hi
>
> thank you for your suggestions, but I am not sure if I explained my problem
> well enough. Lets asume, that I have 30 different script files and 1 script
> which calls these 30 scripts one after the other by "source". Some of the 30
> scripts only contain definitions of functions which are called in other
> scripts, some only execute code, load, save, and interact with the user via
> tcl tk. Together they represent a big programm which asks from the user a
> lot of input, treats and manipulates the input. At the end, the user obtains
> some results files.
> As there are many script sources during the execution, there are a lot of
> different variables initialized etc... Some are only of temporary need, some
> are necessary for later steps in other scripts.
> Now the question: Is there any methodology to ensure the running ? For
> example after every script I could save the whole workingspace into
> script1.Rdata, delete all variable which are not needed in a later script
> (perhabs a little bit difficult to manage that, but possible), and continue
> with the execution.
>
> I don't know if I was able to describe my problem more precisely.
>
> Alexander
>
> cberry wrote
>>
>> Alexander,
>>
>> If Tal's suggestion to use caching in Sweave doesn't appeal to you, you
>> might look at  'R.cache' and other packages mentioned in
>>
>> http://cran.r-project.org/web/views/ReproducibleResearch.html
>>
>> under 'Caching of R Objects'.
>>
>> However, an advantage of the Sweave-like approaches is that you can
>> generate a brief report that includes the versions of scripts used,
>> summarizes the data processing, and gives intermediate results for later
>> inspection and sanity checks.
>>
>> HTH,
>>
>> Chuck
>>
>> Tal Galili <tal.galili@> writes:
>>
>>> Hi Alexander,
>>> Saving full environments is possible, but it is very easy to start
>>> loosing
>>> track on where each variable came from.
>>> You might want to use this process:
>>> http://www.r-bloggers.com/a-better-way-of-saving-and-loading-objects-in-r/
>>> It depends on how many variables you work with, but it might help.
>>>
>>> Another way is to do all of the work through Sweave, and combine it with
>>> caching:  http://cran.r-project.org/web/packages/cacheSweave/index.html
>>> This will ensure that every code chunk will keep the variables you
>>> created,
>>> without the need to re-run the code from scratch.
>>>
>>> For extracting data from outside sources, I would often use the first
>>> method, and for analysis I would use the later option.
>>>
>>> Good luck,
>>> Tal
>>>
>>>
>>> ----------------Contact
>>> Details:-------------------------------------------------------
>>> Contact me: Tal.Galili@ |  972-52-7275845
>>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
>>> www.r-statistics.com (English)
>>> ----------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> On Thu, Apr 19, 2012 at 11:15 AM, Alexander
>>> <juschitz_alexander@>wrote:
>>>
>>>> Hello, I am working under R2.11 Windows and currently I work on a big R
>>>> progjet which executes different R script in a row. Every R script
>>>> represents a module. As every module depends of the variables created in
>>>> the
>>>> modules previously executed, I want to be shure, that I don't create or
>>>> change a variable in a scriptwithout being aware that this affects the
>>>> results in a later executed script. Therefore, I was think to save all
>>>> important variables to keep in a seperate "backup" environment.
>>>> Everytime a
>>>> script starts, it loads the variables of the "backup" environment in
>>>> .GlobalEnv. At the end of the script, I want to add all new obtained
>>>> variables to the "backup" environment (and check automaticaly if any
>>>> variables of "backup"environment is going to be overwritten) and clean
>>>> the
>>>> workspace .GlobalEnv to start the next script neat and tidy. What do you
>>>> think of this solution? Does anyone have better ideas or experience to
>>>> share?
>>>>
>>>> Thank you in advance
>>>>
>>>> Alexander
>>>>
>>>> --
>>>> View this message in context:
>>>> http://r.789695.n4.nabble.com/working-with-environments-to-ensure-code-quality-for-long-R-scripts-tp4570195p4570195.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help@ mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>
>> --
>> Charles C. Berry                            Dept of Family/Preventive
>> Medicine
>> cberry at ucsd edu                        UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>>
>> ______________________________________________
>> R-help@ mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/working-with-environments-to-ensure-code-quality-for-long-R-scripts-tp4570195p4571989.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list