[BioC] Need better quality control for reliable installation of bioconductor packages?

Glynn, Earl EFG at Stowers-Institute.org
Sat Dec 11 00:33:13 CET 2004


I'm using Windows 2000 on one machine and Windows XP on another machine.
The XP machine is not attached to the network.  I did a clean install of
the new R 2.0.1 on the network machine.  I downloaded the files once,
burned a CD and used them to install R and Bioconductor on the XP PC not
connected to the network.  This went flawlessly for R, but not for
Bioconductor.

>   But you have not shown us what commands you are using?

But I believe I have.  My description said I was using the Windows
interface (from the Rgui.exe program from R 2.0.1) and I selected:

R:  Packages | Install package(s) from CRAN ... | <select all packages>
Bioconductor:  Packages | Install package(s) from Bioconductor ... |
<select all packages>

One selects all the packages by clicking and dragging over the desired
files. One then presses OK using the Rgui.exe interface.  R installed
everything and the complete installation worked fine as usual.  

My experience with Bioconductor installation is that there are usually
several failures installing everything without messages that explain
why. 

>   You can (and possibly should) use the reposTools library 
> and the tools  
> that we have built for dealing with the package dependencies 
> that exist  
> in and between many Bioconductor packages.

I understand about dependencies trees, but shouldn't that be a minimal
consideration if I'm installing everything?  From an end-user
perspective, it's about the same amount of work to install everything as
it is to install a few packages.  Why not install all packages rather
than spending multiple times installing just a few?  I don't plan to
re-download files, and possibly burn a CD every time I need to add
R/Bioconductor to another PC. 

>    Did you look to see what the problem is? Try just 
> installing a single  
> package, giving the correct path. These are just zip files open them  
> and see if they are complete (it isn't very hard).
> Do you have Gtk properly installed? If not why would you want 
> a package  
> that relies on it?

I was originally not trying to install just a single package.  I'm was
trying to install all the Bioconductor packages in one shot.  It worked
for R, I assumed that I could install all of the Bioconductor packages
too -- perhaps a flawed assumption.  

>    Do you have Postgres properly installed on your computer? 
> If not, why  
> would you expect this package to install or work? 

I do not have Postgres installed, and I understand I cannot execute the
functions in that package.  But I also can't see any documentation
unless I install it.  Why not install everything just to make
documentation available?

>I don't think that  
> blindly trying to install every package, regardless of 
> whether you want  
> or need to use it makes very much sense. 

I don't understand.  All the R packages downloaded in ZIP format take
~169 MB.  All The Bioconductor packages take ~133 MB.  This is not very
much disk space these days.  We can download them once at our facility
and then install them on multiple PCs (using the network or a CD), which
is faster than redownloading them every time.  

>    You have chosen to install the development version of 
> these packages,  
> that means that you are working with unstable software. 

All I did was pick the "defaults" from the Rgui.exe user interface under
Windows and select all packages.  I'm not sure why this would be
requesting the development version, or an unstable set.  You're blaming
the messenger here.  I'm using default settings inside the latest
version of R.

>    Well, we would like to make it more stable, and accurate and  
> informative user reports help us to do that. You could help, 
> by trying  
> to find out just what the problem is

OK, I did this to try to help (if someone wants the whole console file,
I'd be happy to E-mail it):

Six of the ZIP files are only 2 KB and were probably not correctly made
or transferred:  geneplotter_1.5.0.zip, GOstats_1.1.0.zip,
iSPlot_1.0.3.zip, RdbPgSQL_1.0.9.zip, Rgraphviz_1.5.0.zip,
twilight_1.0.1.zip.  All of these failed to install and stopped the
batch installation until they were avoided.

When I tried another install of a single package that failed, e.g., the
"twilight" package: 
  > local({a<- CRAN.packages(CRAN=getOption("BIOC"))
  + install.packages(select.list(a[,1],,TRUE), .libPaths()[1],
available=a, CRAN=getOption   ("BIOC"), dependencies=TRUE)})
  trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/PACKAGES'
  Content type `text/plain' length 46409 bytes
  opened URL
  downloaded 45Kb

  dependencies 'golubEsets''snow' are not available

  trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/twilight_1.0.1.zip'
  Content type `text/html' length 1119 bytes
  opened URL
  downloaded 1119 bytes

  Error in file(file, "r") : unable to open connection
  In addition: Warning messages: 
  1: error 1 in extracting from zip file 
  2: cannot open file `twilight/DESCRIPTION'  

So there does seem to be a dependency problem. 

Windows XP says Affycomp_1.4.3.zip is 137 KB and is "invalid or
corrupted."  When I tried to download this one again, the length is
being reported as 72,015,716, so perhaps there was some sort of transfer
error on this one.  But the second transfer failed too:

  trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/affycomp_1.4.3.zip'
  Content type `application/zip' length 72015716 bytes
  opened URL
  downloaded 137Kb

  Error in file(file, "r") : unable to open connection
  In addition: Warning messages: 
  1: downloaded length 140288 != reported length 72015716 
  2: error 1 in extracting from zip file 
  3: cannot open file `affycomp/DESCRIPTION' 

But were these the same messages I saw when I tried to install
everything?  To find out, I did yet another R 2.0.1 install (using the
local ZIPs for R 2.0.1) but installed all the  Bioconductor libraries
using the Rgui.EXE interface:

Packages | Install package(s) from Bioconductor ... | <select all
packages> | OK
  
Before doing this I issued this command:
   > sink("BioconductorInstall.txt")

to write all this to a file.  However, only a small amount of info was
written to the file (perhaps syserr?) while most of the information
continued to be displayed on the console (perhaps sysout?)

Here is what was in the console file:  (I deleted some of the lines for
some of the packages that installed without any problems)

> local({a<- CRAN.packages(CRAN=getOption("BIOC"))
+ install.packages(select.list(a[,1],,TRUE), .libPaths()[1],
available=a, CRAN=getOption("BIOC"), dependencies=TRUE)})
trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/PACKAGES'
Content type `text/plain' length 46409 bytes
opened URL
downloaded 45Kb

trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/AnnBuilder_1.4.21.z
ip'
Content type `application/zip' length 814873 bytes
opened URL
downloaded 795Kb

[ MANY DELETIONS.  Quick file comparisons are not easy since one is in
bytes and the other is in KB. ] 

trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/Rgraphviz_1.5.0.zip
'
Content type `text/html' length 1119 bytes
opened URL
downloaded 1119 bytes

[ NOTE HOW SMALL THIS ZIP FILE IS BUT THERE IS NO ERROR MESSAGE.  I
would be suspicious of such  a small zip, but who looks at all the files
during a batch load? One only looks for "error" messages.  But the
transfer looked 100% successful. ]

trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/Ruuid_1.5.0.zip'
Content type `application/zip' length 75770 bytes
opened URL
downloaded 73Kb

[ 5 MORE DELETIONS ]

trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/affycomp_1.4.3.zip'
Content type `application/zip' length 72015716 bytes
opened URL
downloaded 137Kb

[ NOTE THE BIG DIFFERENCE IN FILE SIZES BUT NO ERROR MESSAGE ]

trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/affydata_1.4.0.zip'
Content type `application/zip' length 9867466 bytes
opened URL
downloaded 9636Kb

[ MORE DELETIONS ]

trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/geneplotter_1.5.0.z
ip'
Content type `text/html' length 1119 bytes
opened URL
downloaded 1119 bytes

[ NOTE HOW SMALL THIS FILE IS BUT THERE IS NO ERROR MESSAGE ]

trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/globaltest_3.0.2.zi
p'
Content type `application/zip' length 729257 bytes
opened URL
downloaded 712Kb

[ MANY MORE DELETIONS ]

trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/widgetInvoke_0.0.9.
zip'
Content type `application/zip' length 239583 bytes
opened URL
downloaded 233Kb

trying URL
`http://www.bioconductor.org/bin/windows/contrib/2.0/widgetTools_1.4.7.z
ip'
Content type `application/zip' length 282615 bytes
opened URL
downloaded 275Kb

Error in file(file, "r") : unable to open connection
In addition: Warning messages: 
1: downloaded length 140288 != reported length 72015716 
2: error 1 in extracting from zip file 
3: cannot open file `GOstats/DESCRIPTION' 
>  sink()

[ NOTE THE FIRST CLUE OF AN ERROR IS REALLY HERE.  ]

[ THE CONTENTS OF BioconductorInstall.txt ARE NOT VERY INFORMATIVE ]

dependencies
'XML''NA''GO''SSOAP''RCurl''SJava''repeated''rmutil''hgu95av2probe''hgu9
5acdf''hgu133aprobe''plasmodiumanophelescdf''KEGG''ecoliLeucine''golubEs
ets''RGtk''gtkDevice''hgu95av2''ALL''hgu95av2cdf''hgu133acdf''rae230a''r
ae230aprobe''hsahomology''xlahomology''zebrafish''xenopuslaevis''fibroEs
et''hu6800''YEAST''hgu133a''hu6800cdf''hu6800probe''snow' are not
available

package 'AnnBuilder' successfully unpacked and MD5 sums checked
package 'Biobase' successfully unpacked and MD5 sums checked
package 'Biostrings' successfully unpacked and MD5 sums checked
package 'ChromoViz' successfully unpacked and MD5 sums checked
package 'DEDS' successfully unpacked and MD5 sums checked
package 'DNAcopy' successfully unpacked and MD5 sums checked
package 'DynDoc' successfully unpacked and MD5 sums checked
package 'EBarrays' successfully unpacked and MD5 sums checked
package 'GLAD' successfully unpacked and MD5 sums checked


One long line about "dependencies" doesn't tell the average user very
much.

At this point all the ZIPS (including the corrupted ones) are in a
directory and one can install the a batch of libraries until the next
"mystery" one with a problem is encountered -- that stops the process.  

I suggest the diagnoistic messages here are not that helpful, especially
to an average user, who may only install Bioconductor once or twice a
year.

efg
--
Earl F. Glynn    
Scientific Programmer
Bioinformatics Department
Stowers Institute for Medical Research



More information about the Bioconductor mailing list