[BioC] Merging RandData object with names on the IRanges part

Martin Morgan mtmorgan at fhcrc.org
Thu Aug 20 18:05:05 CEST 2009


Ulrike Goebel wrote:
> Dear list,
> 
> I would like to do the following:
> Read an output file of BWA (SAM format) in "chunks"  and incrementally
> build a RangedData object from
> the chunks (by 'rbind') . Ultimately that should be used to get the
> number of reads per annotated transcript/region, but this is not the
> question here.

Probably you've already worked through this, but, I was wondering
whether reading these files in chunks is necessary? It seems like, even
in very large files, using scan with it's 'what' (including NULL)
argument, you could selectively extract the fairly minimal information
required to construct ranges?

Martin

> Assume as an example:
> t1 <- RangedData(IRanges(start=c(7828367, 7828552,4121953),
> end=c(7828402, 7828587, 4121988)), space=c("Chr1", "Chr1", "Chr3"),
> mapq=c(1,2,1),flag=c(3,4,5))
> 
> I can merge two copies of this by 'rbind(t1,t1)'.
> 
> But:
> t2 <- RangedData(IRanges(start=c(7828367, 7828552,4121953),
> end=c(7828402, 7828587, 4121988), names=c("a", "b", "c")),
> space=c("Chr1", "Chr1", "Chr3"), mapq=c(1,2,1),flag=c(3,4,5))
> (Here, I would like to keep the read names along with their positions in
> the IRanges object).
> 
>> rbind(t2,t2)
> Error in validObject(.Object) :
>  invalid class "RangedData" object: the names of the ranges must equal
> the rownames
> 
> Am I doing something completely wrong here ? Or is it confusing two
> different meanings of 'names' ?
> 
> 
> BTW, I really like IRanges !
> 
> Ulrike
>> sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-08-01 r49053)
> x86_64-unknown-linux-gnu
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
> [1] ChIPR_1.1.3          MASS_7.3-0           spatstat_1.16-1
> [4] deldir_0.0-8         gpclib_1.4-4         mgcv_1.5-5
> [7] convert_1.21.1       marray_1.23.0        matchprobes_1.17.0
> [10] AnnotationDbi_1.7.11 Biostrings_2.13.29   TeachingDemos_2.4
> [13] Ringo_1.9.8          Matrix_0.999375-30   lattice_0.17-25
> [16] limma_2.19.2         RColorBrewer_1.0-2   Biobase_2.5.5
> [19] IRanges_1.3.56
> 
> loaded via a namespace (and not attached):
> [1] affy_1.23.4          affyio_1.13.3        annotate_1.23.1
> [4] DBI_0.2-4            genefilter_1.25.7    nlme_3.1-92
> [7] preprocessCore_1.7.4 RSQLite_0.7-1        splines_2.10.0
> [10] survival_2.35-4      tools_2.10.0         xtable_1.5-5
> 
> 
> 
>



More information about the Bioconductor mailing list