[BioC] Reading paired-end data into GRangesList

Martin Morgan mtmorgan at fhcrc.org
Fri Oct 28 18:33:47 CEST 2011


On 10/28/2011 08:56 AM, Hubert Rehrauer wrote:
> Hi Paul
>
> thanks. Yes, the readBamGappedAlignmenbts will not allow me to pair the
> reads after loading because the read-names will be discarded.

Rsamtools in the next (as in next week!) release supports adding 
arbitrary fields to the GappedAlignment, via ScanBamParam(what=...). In 
the current release it's still straight-forward to scanBam() with 
approriate ScanBamParam, construct a GappedAlignments(), and add 
additional info values(galn)[["rname"]] = bam[[1]][["rname"]]

Martin
>
> I will try, the cigarToIRangesListByRName. I had overlooked this option,
> but I guess it's gonna be slow.
>
> The issue about length normalization is not really problem, since the
> reason why I have gaps on the genome is that the reads have been mapped
> to the transcriptome, without gaps, and then projected to the genome. So
> all gaps are due to introns.
>
> regards,
> hubert
>
>
> Paul Leo wrote:
>> Hi Huber,
>> Think I would just use readBamGappedAlignments from this you will lose
>> the paired end info. Yes you will add the Ns as coverage but the
>> difference here is very small in most cases. Note also you probably want
>> to use tag counts rather that bg coverage anyway...
>>
>> There is another issue ( I think) ; let say for example some NNs were
>> due to an "insertion" in this "samples genome" compared to the
>> reference . When you go to normalise the signal counts per exon or
>> counts per bp whatever ... will you use the exon length/ genome length
>> for that individual or the reference exon length? You will use the
>> reference obviously so its a bit grey what the true answer is...
>>
>> However I believe you can get the exact coverage from the CIGAR if you
>> wish see...
>> http://stuff.mit.edu/afs/athena/software/r/current/lib/R/library/GenomicRanges/html/cigar-utils.htm
>>
>>
>> irl <- cigarToIRangesListByRName(bam$cigar, bam$rname, bam$pos)
>> irl <- irl[elementLengths(irl) != 0]
>> reads <- as(irl,"GRanges")
>> reads1 <- as(irl,"RangedData")
>> gl <- coverage(reads1)
>>
>> probably a bit slower... however probably a bit old now.........
>>
>> The GenomicRanges package has some new documentation on
>> "countGenomicOverlaps" which may sort this out for you as it's designed
>> to make input for edgeR Deseq etc...
>>
>> Cheers
>> Paul
>>
>>
>>
>> -----Original Message-----
>> From: Hubert Rehrauer <Hubert.Rehrauer at fgcz.ethz.ch>
>> To: bioconductor at r-project.org <bioconductor at r-project.org>
>> Subject: [BioC] Reading paired-end data into GRangesList
>> Date: Wed, 26 Oct 2011 23:51:47 +1000
>>
>> Hi
>>
>> I want to load paired-end data from Bam-Files into R in order to do
>> expression counting. The complicating thing is, that the first read
>> was aligned using a gapped alignment (i.e. the cigar string contains Ns).
>>
>> How can this be done? I thought this would be a quite common task but
>> I did not find any function that would do this. Neither scanBam nor
>> readBamGappedAlignments are directly helpful with this.
>>
>> For me the most obvious thing would be to hold the alignment of such a
>> read as a GRangesList. In order to get there I would use scanBam to
>> load the first read. Parse the cigar string to identify the gaps,
>> build a GRangesList and then add the alignment of the second read to
>> the List. Do you have any better ideas?
>>
>>
>> best regards,
>> hubert
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list