[BioC] makeGappedAlignmentPairs for broken pairs

Ido Tamir tamir at imp.ac.at
Fri Jan 17 10:50:06 CET 2014


Dear Valerie,
thank you very much for your reply. I can't test the development packages at the moment, I'm still on 2.12.

> param <- ScanBamParam(flag=scanBamFlag(hasUnmappedMate=TRUE))
a)
Will this load only reads with unmapped mates or will this allow both proper paired and unpaired which is what I really want.
reading the files in twice (once for proper pairs, once for single) would be possible, but I prefer reading in once. 
b)
will findMateAlignment be extended to allow these unmapped mate reads to be represented in a GAlignmentPairs class?
c)
Do you know when this (2.14) will be released?

thank you very much,
ido


On Jan 16, 2014, at 9:09 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:

> Hi Ido,
> 
> library(GenomicAlignments)
> fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
> 
> You can set different flags in ScanBamParam to control what is read in.
> See ?ScanBamParam for details. Here is a 'param' that reads only records with unmapped mates.
> 
> param <- ScanBamParam(flag=scanBamFlag(hasUnmappedMate=TRUE))
> 
> Two classes can hold paired-end reads, one is GAlignmentPairs and the other is GAlignmentsList. They run the same mate-pairing code but return different objects.
> 
> ## GAlignmentPairs:
> 
> Oops. Looks like there is a bug in readGAlignmentPairs() when this flag is set. We will fix that.
> > gapairs <- readGAlignmentPairs(fl, param=param)
> Error in .combineBamFlagFilters(bamFlag(param, asInteger = TRUE), flag0) :
>  BAM flag filters to combine are incompatible
> 
> ## GAlignmentsList:
> 
> galist <- readGAlignmentsList(fl, param=param)
>>> galist
>> GAlignmentsList of length 127:
>> [[1]]
>> GAlignments with 1 alignment and 0 metadata columns:
>>      seqnames strand cigar qwidth start  end width ngap
>>  [1]     seq2      +   35M     35  1520 1554    35    0
>> 
>> [[2]]
>> GAlignments with 1 alignment and 0 metadata columns:
>>      seqnames strand cigar qwidth start end width ngap
>>  [1]     seq1      +   36M     36     6  41    36    0
> 
> There is a metadata column on the list that has the mate status. In this case they are all 'unmated'.
> > mcols(galist)
> DataFrame with 127 rows and 1 column
>    mate_status
>       <factor>
> 1       unmated
> 2       unmated
> 3       unmated
> 4       unmated
> 5       unmated
> ...         ...
> 
> This code is available in the devel branch.
> 
> Valerie
> 
> 
> > sessionInfo()
> R Under development (unstable) (2014-01-06 r64682)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> ...
> 
> other attached packages:
> [1] GenomicAlignments_0.99.11 VariantAnnotation_1.9.28
> [3] Rsamtools_1.15.18         Biostrings_2.31.7
> [5] GenomicRanges_1.15.19     XVector_0.3.6
> [7] IRanges_1.21.19           BiocGenerics_0.9.2
> 
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.25.9   Biobase_2.23.3         biomaRt_2.19.1
> [4] bitops_1.0-6           BSgenome_1.31.7        DBI_0.2-7
> [7] GenomicFeatures_1.15.4 RCurl_1.95-4.1         RSQLite_0.11.4
> [10] rtracklayer_1.23.6     stats4_3.1.0           tools_3.1.0
> [13] XML_3.98-1.1           zlibbioc_1.9.0
> 
> 
> 
> 
> On 01/16/2014 02:57 AM, Ido Tamir wrote:
>> Hi,
>> is there a roadmap for allowing paired end reads where only one is mapped to be reported?
>> Basically I would like to have readGappedAlignmentPairs report them also.
>> One could reread the whole file and take only those reads where the  makeGappedAlignmentPairs test
>> was unsuccessfull.
>> 
>> thank you very much,
>> ido
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
> 
> 
> -- 
> Valerie Obenchain
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B155
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: vobencha at fhcrc.org
> Phone:  (206) 667-3158
> Fax:    (206) 667-1319



More information about the Bioconductor mailing list