[BioC] Problem with nearest() in GenomicRanges

Valerie Obenchain vobencha at fhcrc.org
Tue Jul 31 04:08:04 CEST 2012


Hi Chris,

It's difficult to know what's going on without a reproducible example. 
Can you attach a small subset of your GRanges that still triggers the 
error? If not, is there a place I can get the data for testing?

Valerie

On 07/28/12 12:03, Chris Whelan wrote:
> Hi all,
>
> I'm having an issue with the nearest() method in GenomicRanges. I've
> created GRanges from two data frames, from data for a species genome
> assembly with lots of scaffolds:
>
>> xs<- with(x.data, GRanges(seqnames=Scaffold, ranges=IRanges(start=Start, end=Stop), strand=Strand))
>> xs
> GRanges with 989 ranges and 0 elementMetadata cols:
>          seqnames               ranges strand
>             <Rle>             <IRanges>   <Rle>
>      [1] GL397480 [ 1477346,  1477813]      -
>      [2] GL397446 [  869823,   870295]      +
>      ...      ...                  ...    ...
>    [988] GL397464   [1138154, 1138427]      -
>    [989] GL397464   [1138154, 1138427]      -
>    ---
>    seqlengths:
>     ADFV01135143 ADFV01136404     GL397262 ...     GL397832     GL397836
>               NA           NA           NA ...           NA           NA
>
>> cpg.islands<- with(cpg.island.data, GRanges(seqnames=V1, ranges=IRanges(start=V4, end=V5), strand='*'))
>> cpg.islands
> GRanges with 88968 ranges and 0 elementMetadata cols:
>                seqnames           ranges strand
>                   <Rle>         <IRanges>   <Rle>
>        [1]     GL397261 [  4409,   5062]      *
>        [2]     GL397261 [ 66765,  67010]      *
>        ...          ...              ...    ...
>    [88967] ADFV01162706     [1834, 2052]      *
>    [88968] ADFV01162751     [ 226,  884]      *
>    ---
>    seqlengths:
>     ADFV01134425 ADFV01134436 ADFV01134442 ...     GL405215     GL405216
>               NA           NA           NA ...           NA           NA
>
> And I'd like to use nearest to find the closest island to each x:
>
>> nearest(xs, cpg.islands)
> Error in Rle(values = callGeneric(runValue(e1)[which1],
> runValue(e2)[which2]),  :
>    error in evaluating the argument 'values' in selecting a method for
> function 'Rle': Error in Ops.factor(runValue(e1)[which1],
> runValue(e2)[which2]) :
>    level sets of factors are different
>> traceback()
> 12: Rle(values = callGeneric(runValue(e1)[which1], runValue(e2)[which2]),
>          lengths = diffWithInitialZero(ends), check = FALSE)
> 11: seqnames(x) != seqnames(y)
> 10: seqnames(x) != seqnames(y)
> 9: `seqselect<-`(`*tmp*`, seqnames(x) != seqnames(y), value = NA)
> 8: `seqselect<-`(`*tmp*`, seqnames(x) != seqnames(y), value = NA)
> 7: .local(x, y, ...)
> 6: distance(x[midx], subject[p[midx]])
> 5: distance(x[midx], subject[p[midx]])
> 4: .nearest(x, subject, ignore.strand, ...)
> 3: .local(x, subject, ...)
> 2: nearest(xs, cpg.islands)
> 1: nearest(xs, cpg.islands)
>
> This same data works fine with other methods like findOverlaps():
>
>> findOverlaps(xs, cpg.islands)
> Hits of length 225
> queryLength: 989
> subjectLength: 88968
>      queryHits subjectHits
>       <integer>    <integer>
>   1           2       75585
>   2           6       36857
>   ...       ...         ...
>   224       963       84872
>   225       964       84872
>
> Any idea what I might be doing wrong? Here's my sessionInfo():
>
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=C                 LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C
>   [5] LC_MONETARY=C              LC_MESSAGES=C
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] locfit_1.5-8          GenomicFeatures_1.8.2 AnnotationDbi_1.18.1
>   [4] Biobase_2.16.0        multicore_0.1-7       rtracklayer_1.16.3
>   [7] GenomicRanges_1.8.9   IRanges_1.14.4        BiocGenerics_0.2.0
> [10] reshape2_1.2.1        ggplot2_0.9.1         BiocInstaller_1.4.7
>
> loaded via a namespace (and not attached):
>   [1] BSgenome_1.24.0    Biostrings_2.24.1  DBI_0.2-5          MASS_7.3-19
>   [5] RColorBrewer_1.0-5 RCurl_1.91-1       RSQLite_0.11.1     Rsamtools_1.8.5
>   [9] XML_3.9-4          biomaRt_2.12.0     bitops_1.0-4.1     colorspace_1.1-1
> [13] compiler_2.15.1    dichromat_1.2-4    digest_0.5.2       grid_2.15.1
> [17] labeling_0.1       lattice_0.20-6     memoise_0.1        munsell_0.3
> [21] plyr_1.7.1         proto_0.3-9.2      scales_0.2.1       stats4_2.15.1
> [25] stringr_0.6.1      tools_2.15.1       zlibbioc_1.2.0
>
> Thanks in advance,
>
> Chris
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list