[BioC] Sorting a GAlignments object by QNAME

Martin Morgan mtmorgan at fhcrc.org
Sun Sep 29 22:21:03 CEST 2013


On 09/29/2013 12:32 PM, rubi [guest] wrote:
>
> Is there a way to sort the records in a GAlignments object by the QNAME, as
> this object is created with the readGAlignmentsFromBam function where the bam
> file and its corresponding index file must be sorted by RNAME and POS.
>
> Unless I'm missing something the only way I see how can this be done is read
> the bam into a data.table and sort that.

Unsorted / sorted by qname files can be read in; likely the part that is 
tripping you up is the need to specify character() for index, perhaps with 
yieldSize and obeyQname

   bf = open(BamFile(fl, character(), yieldSize=1000000, obeyQname=TRUE))

If fl were sorted by qname (?sortBam, byQname=TRUE) then this would guarantee 
1000000 qnames per chunk

     repeat {
         aln = readGAlignmentsFromBam(bf)
         if (length(aln) == 0)
            break
         ## do work
     }

Since you've got the devel version, see also ?readGAlignmentsListFromBam which 
will read in mated reads from an RNAME,POS supported file in an iteration like 
above, with more modest memory requirements than reading in the entire file.

Martin

>
> -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale: [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5]
> LC_TIME=English_United States.1252
>
> attached base packages: [1] parallel  stats     graphics  grDevices utils
> datasets  methods   base
>
> other attached packages: [1] doParallel_1.0.3      iterators_1.0.6
> foreach_1.4.1         data.table_1.8.10     Rsamtools_1.13.44
> Biostrings_2.29.19    GenomicRanges_1.13.45 XVector_0.1.4 [9] IRanges_1.19.38
> BiocGenerics_0.7.5
>
> loaded via a namespace (and not attached): [1] bitops_1.0-6
> codetools_0.2-8 stats4_3.0.2    tools_3.0.2     zlibbioc_1.7.0
>
> -- Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________ Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list