[BioC] BioConductor Deployment at a large site, suggestions?

Brian K Smith brian at stat.ohio-state.edu
Wed Dec 1 00:41:17 CET 2004

I may be missing something simple here, if I am please tell me.

I'd like to request an easy way to set up a local repository of
BioConductor for a large site.  What I would like is a simple tar archive
of all the packages and dependencies I need for a default install of
BioConductor, and a list of the installation order.

For an individual the getBioC function is very easy, and great.  However,
as a site administrator I would prefer not to use it as:
 1) Downloading the same stuff for 100 machines is a waste of bandwidth.
 2) I want all machines to have the same version, until I can upgrade
    all of them.  Doing getBioC on a machine I install a month from now
    may get different versions, and/or require a different version of R.
 3) I could just run getBioC on all machines once a week to assure
    uniformity, but this is a waste of bandwidth and newer versions of
    BioConductor may require me to upgrade R before I am certain that
    other packages I have will work on the new R.

Previously, in June, I was able to download all the "default" packages
and install them with R CMD INSTALL, but this was difficult as default
is defined differently by:
  1) the web page
  2) the getBioC script's comments
  3) the getBioC script's output
Eventually, I used #3 to determine what I needed and where to get it.
I then made a script to install the packages from an NFS directory.

Doing this was difficult since certain dependencies are not listed
on the web for download, such as GO and KEGG, which do not seem to be
on the BioConductor web pages, or on CRAN.  Finding GO/KEEG was made
more difficult since the web directory where getBioC gets the packages
(http://www.bioconductor.org/data/metaData/) doesn't allow me to bring
them up in a browser.  So to get the tar.gz files I had to guess at the
names from the output of getBioC.

Somewhere I did see that getBioC can be set to use a local web repository,
but for the reasons above this does not seem trivial to set up.  I did
search the mailing list, and the closest was someone with a makefile
which used wget to get _everything_, not just a default install.

Thus, can I request:
  * a link on the page, maybe 'For Administrators of many machines'
  on this page something saying which packages and dependencies are
  needed for the default install.
  * Either one large download with the packages, or at least a page or
  ftp site listing all packages available and dependencies needed,
  such as GO and KEGG.
Once I have the default install adding additional packages is simple.

This would make life easier for those of us with many machines that
need to have the same software, and may need the exact same software
reinstalled in the future.

I'm attempting to upgrade to R 2.0, which should be relatively simple,
but BioConductor is adding many frustrating hours of work that I shouldn't
have to do every time want to refresh the package.  Please tell me I'm
missing some simple way to do a large scale deployment of this package!


More information about the Bioconductor mailing list