[BioC] Reading paired-end data into GRangesList

Fri Oct 28 18:16:58 CEST 2011

readBamGappedAlignments will retain read names if the use.names arg is
set to TRUE.  In general, I find grglist more high-level and
convenient to use than cigarToIRangesListByRName.

-Cory

On Fri, Oct 28, 2011 at 8:56 AM, Hubert Rehrauer
<Hubert.Rehrauer at fgcz.ethz.ch> wrote:
> Hi Paul
>
> thanks. Yes, the readBamGappedAlignmenbts will not allow me to pair the
> reads after loading because the read-names will be discarded.
>
> I will try, the cigarToIRangesListByRName. I had overlooked this option, but
> I guess it's gonna be slow.
>
> The issue about length normalization is not really problem, since the reason
> why I have gaps on the genome is that the reads have been mapped to the
> transcriptome, without gaps, and then projected to the genome. So all gaps
> are due to introns.
>
> regards,
> hubert
>
>
> Paul Leo wrote:
>>
>> Hi Huber,
>> Think I would just use readBamGappedAlignments from this you will lose
>> the paired end info. Yes you will add the Ns as coverage but  the
>> difference here is very small in most cases. Note also you probably want
>> to use tag counts rather that bg coverage anyway...
>>
>> There is another issue ( I think) ; let say for example some NNs were
>> due to an "insertion" in this "samples genome" compared to the
>> reference . When you go to normalise the signal counts per exon or
>> counts per bp whatever ... will you use the exon length/ genome length
>> for that individual or the reference exon length?  You will use the
>> reference obviously so its a bit grey what the true answer is...
>>
>> However I believe you can get the exact coverage from the CIGAR if you
>> wish see...
>>
>> http://stuff.mit.edu/afs/athena/software/r/current/lib/R/library/GenomicRanges/html/cigar-utils.htm
>>
>> irl <- cigarToIRangesListByRName(bam$cigar, bam$rname, bam$pos)
>> irl <- irl[elementLengths(irl) != 0]
>> reads <- as(irl,"GRanges")
>> reads1 <- as(irl,"RangedData")
>> gl <- coverage(reads1)
>>
>> probably a bit slower... however probably  a bit old now.........
>>
>> The GenomicRanges package has some new documentation on
>> "countGenomicOverlaps" which may sort this out for you as it's designed
>> to make input for edgeR Deseq etc...
>>
>> Cheers
>> Paul
>>
>>
>>
>> -----Original Message-----
>> From: Hubert Rehrauer <Hubert.Rehrauer at fgcz.ethz.ch>
>> To: bioconductor at r-project.org <bioconductor at r-project.org>
>> Subject: [BioC] Reading paired-end data into GRangesList
>> Date: Wed, 26 Oct 2011 23:51:47 +1000
>>
>> Hi
>>
>> I want to load paired-end data from Bam-Files into R in order to do
>> expression counting. The complicating thing is, that the first read was
>> aligned using a gapped alignment (i.e. the cigar string contains Ns).
>>
>> How can this be done? I thought this would be a quite common task but I
>> did not find any function that would do this. Neither scanBam nor
>> readBamGappedAlignments are directly helpful with this.
>>
>> For me the most obvious thing would be to hold the alignment of such a
>> read as a GRangesList. In order to get there I would use scanBam to load the
>> first read. Parse the  cigar string to identify the gaps, build a
>> GRangesList and then add the alignment of the second read to the List. Do
>> you have any better ideas?
>>
>>
>> best regards,
>> hubert
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>