[BioC] Error in R code for GOstats Vignette section "Using Shortest Paths"

Robert Gentleman rgentlem at fhcrc.org
Tue May 30 18:44:20 CEST 2006


Hi,
  Thanks for the bug report.

Carleton Garrett wrote:
> Hi
> 
> I'm currently running R version 2.2.0 under Windows XP with 2 Gb RAM.
> 
> I'm working through the GOstats vignette using the GOstats.Rnw file to 
> obtain the R code (for a description of the package see at end of this 
> E-mail)
> 
> The first objective of this section of the vignette is to extract all of 
> the probe sets in the hgu95av2 chip that are associated with 
> transcription factor GO identifiers using:
> 
> TF2 <- get("GO:0003700", hgu95av2GO2ALLPROBES)
> 
> FYI- length(TF2) = 834
> 
> The next step gets the locus links (Entrez Gene IDs) associated with 
> these probe sets thus:
> 
> LLs <- getLL(TF2, "hgu95av2")
> 
> FYI - length(LLs) = 834
> 
> The third step gets a vector of probe sets that have been selected that 
> show some level of expression and some variation in expression across 
> samples.  The data is contained within the exprSet = esetSub.
> 
> gN = geneNames(esetSub)
> 
> FYI - length(gN) = 2391
> 
> So far so good.  The next objective is to get probe sets that are common 
> to both TF2 and gN and uses the following code:
> 
> hv <- match(gN, TF2, 0)
> 
> hv contains 159 non zero terms
>  > length(hv[!(hv ==0 )])
> [1] 159
> 
> HOWEVER, THESE NON ZERO TERMS ARE THE INDEX VALUES THAT LOCATE THE PROBE 
> SETS in TF2 - NOT in gN!!!  The next part of the code is where the error 
> occurs and this error is propagated in the subsequent code for this section.
> 

  No need to shout, yes it does seem to be the wrong way around. We will 
fix it and push a fix out in the next days.
  Thanks again (it is often helpful to get the output of sessionInfo, 
when reporting bugs as it lets us make sure we are talking about the 
same thing).
   You will need to update to the most recent version of R/Bioconductor 
to get the benefit of our fixes as we do not have the resources to patch 
outdated releases.

   thanks again,
    Robert



> oTF2 <- gN[hv]
> 
> As one would expect from the above, the length(oTF2) does equal 159.  
> However, VERY FEW of these probe sets in oTF2 belong to the vector of 
> probes selected on the basis of an association with GO::0003700 - that 
> is - very few of them (only 14) are actually part of TF2.  Thus:
> 
>  > length(oTF2[oTF2 %in% TF2])
> [1] 14
> 
> whereas all values of oTF2 should be in TF2.
> 
> If one revises the above code thus:
> 
> hvcorr <- hv[!(hv==0)]
> 
> oTF2corr<- TF2[hvcorr]
> 
> One again gets length(oTF2corr) = 159 but now the probe sets are in both 
> TF2 and gN:
> 
>  > length(TF2[TF2 %in% oTF2corr])
> [1] 159
> 
>  > length(gN[gN %in% oTF2corr])
> [1] 159
> 
> Thus, all subsequent calculations in this section of the vignette that 
> depend on oTF2 are in error.
> 
> This problem has probably been raised before and I just now 
> rediscovering it.  If so, I would appreciate your pointing me to the 
> thread or location of the correction.

nope, it hasn't

> 
> Thanks
> 
> Carl Garrett
> 
> 
> 
> 
> ======================================================================
> Description
> Package: GOstats
> Title: Tools for manipulating GO and microarrays.
> Version: 1.4.0
> Date: 20 Jan 2005
> Author: R. Gentleman
> Description: A set of tools for interacting with GO and microarray
>         data. A variety of basic manipulation tools for graphs,
>         hypothesis testing and other simple calculations.
> biocViews: Statistics, Annotation, GO, MultipleComparisons
> Depends: graph, GO, annotate, RBGL, xtable, Biobase, genefilter,
>         multtest
> Suggests: hgu95av2 (>= 1.6.0)
> Maintainer: R. Gentleman <rgentlem at fhcrc.org>
> License: GPL2.0
> Packaged: Wed Oct 12 21:34:06 2005; biocbuild
> Built: R 2.2.0; ; 2005-10-12 21:34:10; windows
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list