[BioC] [devteam-bioc] GenomicRanges seqlengths problem

Valerie Obenchain vobencha at fhcrc.org
Thu May 1 18:52:45 CEST 2014


Hi,

The seqnames are in a different orders in the 'chrs' and 'sl' objects. 
Looking at the first 3 names in each,

 > head(seqlengths(chrs), 3)
  lm_SuperContig_0_v2  lm_SuperContig_1_v2 lm_SuperContig_10_v2
                   NA                   NA                   NA

 > head(sl, 3)
lm_SuperContig_0_v2 lm_SuperContig_1_v2 lm_SuperContig_2_v2
             4258568             3378610             2939989

When a GRanges is created, seqnames are sorted in ascii order. You can 
see the full list with seqinfo():

 > seqinfo(chrs)
Seqinfo of length 34
seqnames             seqlengths isCircular genome
lm_SuperContig_0_v2        <NA>       <NA>   <NA>
lm_SuperContig_1_v2        <NA>       <NA>   <NA>
lm_SuperContig_10_v2       <NA>       <NA>   <NA>
lm_SuperContig_11_v2       <NA>       <NA>   <NA>
lm_SuperContig_12_v2       <NA>       <NA>   <NA>
...                         ...        ...    ...

The seqnames in the replacement vector must match the order in the 
'Seqinfo' object.

To re-order:

sl_new <- sl[match(levels(factor(names(sl))), names(sl))]

Then add the new lengths:

>> seqlengths(chrs) <- sl_new
>> chrs
> GRanges with 34 ranges and 0 metadata columns:
>                    seqnames       ranges strand
>                       <Rle>    <IRanges>  <Rle>
>    [1]  lm_SuperContig_0_v2 [1, 4258568]      *
>    [2]  lm_SuperContig_1_v2 [1, 3378610]      *
>    [3]  lm_SuperContig_2_v2 [1, 2939989]      *
>    [4]  lm_SuperContig_3_v2 [1, 2348246]      *
>    [5]  lm_SuperContig_4_v2 [1, 1918205]      *
>    ...                  ...          ...    ...
>   [30] lm_SuperContig_29_v2  [1, 200940]      *
>   [31] lm_SuperContig_30_v2  [1, 154863]      *
>   [32] lm_SuperContig_31_v2  [1, 143268]      *
>   [33] lm_SuperContig_32_v2  [1,  87679]      *
>   [34] lm_SuperContig_34_v2  [1,  58596]      *
>   ---
>   seqlengths:
>     lm_SuperContig_0_v2  lm_SuperContig_1_v2 ...  lm_SuperContig_9_v2
>                 4258568              3378610 ...              1772623


Valerie


On 05/01/2014 07:20 AM, Maintainer wrote:
>
> Hello,
>
> I have a problem with supplying GRanges object with seqlengths.
> I have a files.
> chrs.txt - contains information about chromosomes
>
> chrs.txt
> chr,start,end,len
> lm_SuperContig_0_v2,1,4258568,4258568
> lm_SuperContig_1_v2,1,3378610,3378610
> lm_SuperContig_2_v2,1,2939989,2939989
> lm_SuperContig_3_v2,1,2348246,2348246
> lm_SuperContig_4_v2,1,1918205,1918205
> lm_SuperContig_6_v2,1,1888674,1888674
> lm_SuperContig_5_v2,1,1869450,1869450
> lm_SuperContig_8_v2,1,1809296,1809296
> lm_SuperContig_9_v2,1,1772623,1772623
> lm_SuperContig_7_v2,1,1769547,1769547
> lm_SuperContig_10_v2,1,1758670,1758670
> lm_SuperContig_13_v2,1,1634580,1634580
> lm_SuperContig_12_v2,1,1631710,1631710
> lm_SuperContig_11_v2,1,1590160,1590160
> lm_SuperContig_15_v2,1,1560629,1560629
> lm_SuperContig_14_v2,1,1533332,1533332
> lm_SuperContig_17_v2,1,1445693,1445693
> lm_SuperContig_16_v2,1,1397653,1397653
> lm_SuperContig_18_v2,1,1351976,1351976
> lm_SuperContig_19_v2,1,1186800,1186800
> lm_SuperContig_20_v2,1,1087932,1087932
> lm_SuperContig_21_v2,1,1020521,1020521
> lm_SuperContig_22_v2,1,731443,731443
> lm_SuperContig_23_v2,1,521426,521426
> lm_SuperContig_24_v2,1,475869,475869
> lm_SuperContig_25_v2,1,318058,318058
> lm_SuperContig_26_v2,1,261540,261540
> lm_SuperContig_27_v2,1,250629,250629
> lm_SuperContig_28_v2,1,236098,236098
> lm_SuperContig_29_v2,1,200940,200940
> lm_SuperContig_30_v2,1,154863,154863
> lm_SuperContig_31_v2,1,143268,143268
> lm_SuperContig_32_v2,1,87679,87679
> lm_SuperContig_34_v2,1,58596,58596
>
> I try to do the following:
> # creating GRanges object for chromosomes
> dataChr <- read.table("chrs.txt",header=T,sep=",")
> chrs <- with(dataChr, GRanges(chr, IRanges(start, end)))
> sl <- setNames(dataChr$len, as.character(dataChr$chr))
> seqlengths(chrs) <- sl
>
> And I get the following error:
>
> Error in .normargSeqlengths(value, seqnames(x)) :
>    when the supplied 'seqlengths' vector is named, the names must match the seqnames
>
> Any chance you could help me with what is going on?
>
> Best wishes,
> Agnieszka
>
>
>   -- output of sessionInfo():
>
> R version 3.1.0 (2014-04-10)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] XVector_0.4.0        ggbio_1.12.3         ggplot2_0.9.3.1      GenomicRanges_1.16.2 GenomeInfoDb_1.0.2   IRanges_1.22.4       BiocGenerics_0.10.0
> [8] BiocInstaller_1.14.2
>
> loaded via a namespace (and not attached):
>   [1] AnnotationDbi_1.26.0     BatchJobs_1.2            BBmisc_1.6               Biobase_2.24.0           BiocParallel_0.6.0       biomaRt_2.20.0
>   [7] Biostrings_2.32.0        biovizBase_1.12.1        bitops_1.0-6             brew_1.0-6               BSgenome_1.32.0          cluster_1.15.2
> [13] codetools_0.2-8          colorspace_1.2-4         DBI_0.2-7                dichromat_2.0-0          digest_0.6.4             fail_1.2
> [19] foreach_1.4.2            Formula_1.1-1            GenomicAlignments_1.0.0  GenomicFeatures_1.16.0   grid_3.1.0               gridExtra_0.9.1
> [25] gtable_0.1.2             Hmisc_3.14-4             iterators_1.0.7          labeling_0.2             lattice_0.20-29          latticeExtra_0.6-26
> [31] MASS_7.3-31              munsell_0.4.2            plyr_1.8.1               proto_0.3-10             RColorBrewer_1.0-5       Rcpp_0.11.1
> [37] RCurl_1.95-4.1           reshape2_1.4             Rsamtools_1.16.0         RSQLite_0.11.4           rtracklayer_1.24.0       scales_0.2.4
> [43] sendmailR_1.1-2          splines_3.1.0            stats4_3.1.0             stringr_0.6.2            survival_2.37-7          tools_3.1.0
> [49] VariantAnnotation_1.10.0 XML_3.98-1.1             zlibbioc_1.10.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> ________________________________________________________________________
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>


-- 
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109

Email: vobencha at fhcrc.org
Phone: (206) 667-3158



More information about the Bioconductor mailing list