[BioC] limma topTable question

Sean Davis sdavis2 at mail.nih.gov
Tue Apr 5 12:09:58 CEST 2005


On Apr 5, 2005, at 12:37 AM, Cyrus Harmon wrote:

>
> On Apr 4, 2005, at 8:14 PM, Sean Davis wrote:
> What do you mean by "change the order ... in my.toptable? I get the  
> obvious part, but the question is more of a mechanical one, having  
> done my.toptable <- toptable? How do I edit the ord <- line? Clearly,  
> I can put the function definition in an emacs/ESS buffer and eval the  
> function def, but is there a better way to do this? The REPL is very  
> nice, but the model of eval'ing function defs or regions one at a time  
> in emacs buffers seems somewhat cloddish. Back to the parallel, with  
> slime, is there a nice way to make this change take effect? It seems  
> that the problem is magnified if I'm trying to develop an R extension  
> as I have to do R CMD INSTALL in order to get the change to take  
> effect in the place I eventually want to use it. I realize I've gone  
> totally off of the topic from the original question, but if the  
> preferred model of tweaking packages like this is as you've described,  
> I feel like I must be missing something about the mechanics of writing  
> and eval'ing R code.
>

If you never, EVER, want the limma topTable and toptable functions  
back, by all means, just edit the code in the limma package directly  
(without making the my.topTable and my.toptable copies), but this has  
obvious risks.  I do this only when I have a function from a package  
that I need to change, but that is called by many other functions  
within that package.

Short of replacing the functions within limma, you have at least three  
options.  R lets you save your workspace, which includes functions that  
you have defined.  Therefore, after you edit your function and eval it  
in R once (or if you follow the copy route that I suggested), you will  
have my.topTable and my.toptable in your workspace.  Save that  
workspace and each time you reload it, your functions will be there,  
ready to use.  No need to build a package, install it, load it, etc.   
Option number 2 is to "save" the functions using save.  Something like:

save(file='my.limma.replacements.RData',list=c('my.toptable','my.topTabl 
e'))

will save my.toptable and my.topTable in a file of the name given.   
Then, you can:

load('my.limma.replacements.RData')

and your functions are back in your workspace for you to use.

Finally, you can go the route of saving your files into your own  
package.  I understand that re-installing a package is a few extra  
steps, but is it REALLY that hard?

I've given at least three systems that R allows for you to make your  
functions available to you without modifying the limma source  
code--choose the one you like.


>> If you make a copy of topTable called my.topTable by:
>>
>> my.topTable <- topTable
>>
>> and change it so that it calls my.toptable instead of toptable, you  
>> now have your own function called my.topTable that does what you  
>> want.  You can of course make any other changes to the functions that  
>> you want--add your own options, etc.  The simple task of looking at  
>> others' code is quite powerful when dealing with issues like the one  
>> you bring up.  I would encourage all who use bioconductor and R to  
>> try it whenever possible; even if it doesn't all make sense, it is a  
>> very good way to learn.
>
> Sure, but I'd hope that package maintainers were open to well-written  
> and documented patches that added the functionality to the library  
> itself, rather than having tons of local copies of possibly out of  
> date lying code. I suppose, going back to my previous question, I  
> could store my.toptable.diff and apply the diff on the fly and iff the  
> patch succeeds eval a modified my.toptable, but that seems a bit  
> hokey.
>

I think the package maintainers ARE open to well-written and documented  
patches that add functionality to the library.  However, I also think  
that they need to (and generally are very good at) balancing the needs  
of the few with the needs of the many (not that this applies in your  
case) and maintaining a flexible but not too "kitchen-sink" approach to  
general functions.

>>
>>> (Combining my question and my gripe, a sort by "m" that didn't do  
>>> abs(M) would seem useful to me, but perhaps I'm missing something.)
>>
>> If you are not typically a programmer in bioconductor, this seems  
>> like a good chance to try your hand at it.  If you get something that  
>> you like better than what Gordon has offered in Limma, send him the  
>> modified code. He, like most of bioconductor/R developers, is  
>> remarkably receptive and responsive to criticism/improvements.
>
> This is great to hear. I feel like I'm having trouble figuring out how  
> to develop mid-size projects in R. Clearly, typing R commands straight  
> into the REPL is a nice way to play around, and clearly the R  
> extension mechanism is a great way to package up R extensions for  
> distribution, but for developing my own mid-size R packages, I'm still  
> unclear on reasonable idioms for putting together my own mid-size  
> projects. So far the best I've come up with is local packages that  
> still need to be R CMD INSTALL'ed, but to a local directory, and then  
> library(lib.loc=<some-nice-local-path>) in my scripts, but this topic  
> has probably been previously covered ad nauseum. Time to go digging  
> through the docs and r-help archives.
>

You can set R to load whatever libraries you would like at startup,  
obviating the need to remember to do it yourself.

As for developing your own projects, I use ESS for code development  
(C-c C-d to edit an object in the workspace, C-c C-l to load a buffer  
back into the workspace).  I typically go the "save" route from above  
for simple functions that stand on their own or "replace" other  
functions for the life of a project, but if I use it in more than one  
project, I typically put it into a fairly unstructured library that  
contains frequently-used functions and little documentation.  The next  
level is a full-blown, well-documented, tested package for public  
consumption.  While "programmers" would argue that every function needs  
to be fully documented, etc., I don't do that for my little utility  
packages (and sometimes pay for my laziness), but obviously for public  
consumption, documentation is ABSOLUTELY necessary.

Again, hope this helps.

Sean



More information about the Bioconductor mailing list