[BioC] getSeq human genomic sequences

Hervé Pagès hpages at fhcrc.org
Fri Jan 30 21:04:06 CET 2009


Hi Tiando,

Have you checked the 'start' and 'end' you get when i = 5263? They are
probably out of limits.

Try to isolate the problem with something like:

   i <- 5263
   start <- unique(chr6[,6])[i]
   end <- start + 1
   getSeq(Hsapiens, "chr6", start, end)

Do you have 1 <= start <= end <= length(Hsapiens$chr6) ?

I've modified the subseq() function so the error mesg you get is a little
bit more informative (subseq() is defined in the IRanges package and
called by getSeq()). After you update to IRanges 1.0.11 (available within
24 hours), you will get:

   > getSeq(Hsapiens, 'chr6', start=170899994, end=170899995)
   Error in solveUserSEW(length(x), start = start, end = end, width = width) :
     solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start 
(170899994) is > refwidth + 1
   Error in subseq(unmasked(x), start = start, end = end, width = width) :
     Invalid sequence coordinates.
     Are you sure the supplied 'start', 'end' and 'width' arguments
     are defining a region that is within the limits of the sequence?

However, on thing is puzzling me. The error mesg you get shows that
the start is 63219120 and that is of course within the limits of Human
chr6 (this chromosome is 170899992 long in hg18), so maybe something else
is going on...

Cheers,
H.


Tiandao Li wrote:
> Hello,
> 
> I used getSeq to retrieve human genomic sequences from the BSgenome data
> package:
> 
> library(BSgenome.Hsapiens.UCSC.hg18)
> 
> # cpg is GTF file
> chr6 <- cpg[grep("6", cpg[[5]]),] # chrom 6
> length(unique(chr6[,6])) # 10683 genomic positions
> 
> # loop to match "CG" and print 100bp to file
> for (i in 1:10683)
> {
> if (getSeq(Hsapiens, "chr6",
> unique(chr6[,6])[i],(unique(chr6[,6])[i]+1))=="CG")
> {
> write.table(paste(">",
> unique(chr6[,6])[i],sep=""),file="chr6FOR",row.names=F, col.names=F,
> append=TRUE)
> write.table(getSeq(Hsapiens, "chr6",
> (unique(chr6[,6])[i]-50),(unique(chr6[,6])[i]+50)),file="chr6FOR",row.names=F,
> col.names=F, append=TRUE)
> }
> }
> 
> However, I always got the error meg like the following when i = 5263
> 
> Error in solveUserSEW(length(x), start = start, end = end, width = width) :
>   solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start
> (63219120) is > refwidth + 1
> 
> Please let me know if there's something I should be paying attention to.
> 
> Thanks!
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list