[BioC] Curious error with 'subseq' function from BSgenome (IRanges)

Martin Morgan mtmorgan at fhcrc.org
Fri Jun 4 13:06:41 CEST 2010


On 06/04/2010 03:52 AM, J.delasHeras at ed.ac.uk wrote:
> 
> Hi everyone,
> 
> I am using the BSgenome package and annotations to retrieve several
> thousand sequences (22k) corresponding to a promoter microarray.
> 
> Basically I run a loop through the whole list of chromosome name, start,
> and stop coordinates, and retrieve each 1Kb sequence using the 'subseq'
> function.
> 
> When I run it, I get the following error *sometimes*:
> Error in get(name, envir = .classTable) :
>   formal argument "envir" matched by multiple actual arguments

Hi Jose --

This sounds like a bug in R, fixed in the R-2.11.* series, and updating
your R (and packages, see http://bioconductor.org/docs/install/) should
fix this. If not, it would be great to hear...

Martin

> 
> The first time, I retrieved the index at which it had encountered the
> error, and ran the 'subseq' command alone. No problem. In fact, if I
> re-run teh whole thing the error may occur at another point. Once it
> even ran the whole thing without a hitch.
> 
> I ended up putting the loop within a 'try' function, so that if there
> was an error, the loop coould restart where it left earlier and
> eventually retrieve the whole list. The number of times there's an error
> varies from run to run, and I see that the error messages are also varied.
> 
> I just re-ran the loop again, just for fun. This is the code:
> 
> library(BSgenome.Mmusculus.UCSC.mm8)
> # create vectors to store results in:
> newseq2<-vector(mode="character", length=dim(UInfo)[1])
> newstart2<-vector(mode="numeric", length=dim(UInfo)[1])
> newstop2<-vector(mode="numeric", length=dim(UInfo)[1])
> ambiguous.orientation<-c()
> 
> #UInfo is a data frame containing annotations. I extract chr,start,stop
> from it
> j<-1
> i<-1
> while(i<=dim(UInfo)[1])
>   {
>   if (i==dim(UInfo)[1]) stop("finished")
>   try(
>   for (i in j:dim(UInfo)[1])
>     {
>     # first extract chromosome name from the "NimbleGenID" included
>     # in the annotation.
>     # It is in the same format as the BSgenome annotation package
>     # for mouse, so it's a straight extraction:
>     chr<-sub(":.+$","",unlist(strsplit(UInfo[i,"NimbleGenID"],split="
> "))[1])
>     if (chr=="NA") next
>     # extract start and stop:
>     start<-as.numeric(UInfo[i,"Start"])
>     stop<-as.numeric(UInfo[i,"End"])
>     # extract strand orientation:
>     strand<-UInfo[i,"Frame"]
>     # calculate the coordinates for the 1Kb upstream region:
>     if (strand=="-")
>       {
>       upstart<-stop+1
>       upstop<-min(upstart+1000,length(Mmusculus[[chr]]))
>       }
>     if (strand=="+")
>       {
>       upstart<-max(start-1000,1)
>       upstop<-max(start-1,1)
>       }
>     if (!(strand %in% c("+","-")))
>       {
>       upstart<-upstop<-NA
>       # when orientation is not clearly given, store indices for
>       # further processing:
>       ambiguous.orientation<-c(ambiguous.orientation,i)
>       newseq2[i]<-"NNN"
>       newstart2[i]<-upstart
>       newstop2[i]<-upstop
>       next
>       }
>     #extract sequence:
>     sequence<-subseq(Mmusculus[[chr]],upstart,upstop)
>     sequence<-as.character(sequence)
>     #store results:
>     newstart2[i]<-upstart
>     newstop2[i]<-upstop
>     newseq2[i]<-sequence
>     })
>   # check whether the last index done is the last in the list.
>   # if not, it means tehre was an abnormal exit.
>   # update "j" to teh value of the last index "i", and the
>   # loop will restart from the point it left earlier:
>   if (i!=dim(UInfo)[1]) j<-i
>   # write a tell-tale file so I can see where the problems occur as they
>   # happen:
>   write.table(1, paste(i,"_"))
>   }
> 
> 
> This time it produced an error 7 times. The errors reported were:
> Error in get(name, envir = .classTable) :
>   formal argument "envir" matched by multiple actual arguments
> Error in assign(".target", method at target, envir = envir) :
>   formal argument "envir" matched by multiple actual arguments
> Error in assign(".defined", method at defined, envir = envir) :
>   formal argument "envir" matched by multiple actual arguments
> Error in assign("disabled", disabled, envir = .validity_options) :
>   formal argument "envir" matched by multiple actual arguments
> Error in assign(".defined", method at defined, envir = envir) :
>   no function to return from, jumping to top level
> Error in shift(restrict(nir, start = solved_start, end = solved_end),  :
>   error in evaluating the argument 'x' in selecting a method for
> function 'shift'
> Error in assign(".Method", method, envir = envir) :
>   formal argument "envir" matched by multiple actual arguments
> Error: finished
> 
> The last one is not really an error, I just used the 'stop' function to
> report the job was done, so it says "error"...
> 
> Clearly there is nothing wrong with the coordinates or other parameters
> in the subseq command, because I can repeat it.
> I find it very strange that the errors will happen at different
> points... or sometimes (rarely) nowhere at all.
> 
> I got the result I was after by embedding the loop in a 'try' command,
> and that inside a 'while' loop... But I wonder why this happened in the
> first place.
> 
> My session info follows:
> 
> 
>> sessionInfo()
> R version 2.10.0 (2009-10-26)
> i386-pc-mingw32
> 
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252
> [2] LC_CTYPE=English_United Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United Kingdom.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] BSgenome.Mmusculus.UCSC.mm8_1.3.16 BSgenome_1.14.2
> [3] Biostrings_2.14.12                 IRanges_1.4.16
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.1 tools_2.10.0
> 
> 
> Jose
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list