[BioC] Reading paired-end data into GRangesList

Hubert Rehrauer Hubert.Rehrauer at fgcz.ethz.ch
Fri Oct 28 19:57:48 CEST 2011


many thanks, that will do the job for now,

hubert

Cory Barr wrote:
> readBamGappedAlignments will retain read names if the use.names arg is
> set to TRUE.  In general, I find grglist more high-level and
> convenient to use than cigarToIRangesListByRName.
>
> -Cory
>
> On Fri, Oct 28, 2011 at 8:56 AM, Hubert Rehrauer
> <Hubert.Rehrauer at fgcz.ethz.ch> wrote:
>   
>> Hi Paul
>>
>> thanks. Yes, the readBamGappedAlignmenbts will not allow me to pair the
>> reads after loading because the read-names will be discarded.
>>
>> I will try, the cigarToIRangesListByRName. I had overlooked this option, but
>> I guess it's gonna be slow.
>>
>> The issue about length normalization is not really problem, since the reason
>> why I have gaps on the genome is that the reads have been mapped to the
>> transcriptome, without gaps, and then projected to the genome. So all gaps
>> are due to introns.
>>
>> regards,
>> hubert
>>
>>
>> Paul Leo wrote:
>>     
>>> Hi Huber,
>>> Think I would just use readBamGappedAlignments from this you will lose
>>> the paired end info. Yes you will add the Ns as coverage but  the
>>> difference here is very small in most cases. Note also you probably want
>>> to use tag counts rather that bg coverage anyway...
>>>
>>> There is another issue ( I think) ; let say for example some NNs were
>>> due to an "insertion" in this "samples genome" compared to the
>>> reference . When you go to normalise the signal counts per exon or
>>> counts per bp whatever ... will you use the exon length/ genome length
>>> for that individual or the reference exon length?  You will use the
>>> reference obviously so its a bit grey what the true answer is...
>>>
>>> However I believe you can get the exact coverage from the CIGAR if you
>>> wish see...
>>>
>>> http://stuff.mit.edu/afs/athena/software/r/current/lib/R/library/GenomicRanges/html/cigar-utils.htm
>>>
>>> irl <- cigarToIRangesListByRName(bam$cigar, bam$rname, bam$pos)
>>> irl <- irl[elementLengths(irl) != 0]
>>> reads <- as(irl,"GRanges")
>>> reads1 <- as(irl,"RangedData")
>>> gl <- coverage(reads1)
>>>
>>> probably a bit slower... however probably  a bit old now.........
>>>
>>> The GenomicRanges package has some new documentation on
>>> "countGenomicOverlaps" which may sort this out for you as it's designed
>>> to make input for edgeR Deseq etc...
>>>
>>> Cheers
>>> Paul
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Hubert Rehrauer <Hubert.Rehrauer at fgcz.ethz.ch>
>>> To: bioconductor at r-project.org <bioconductor at r-project.org>
>>> Subject: [BioC] Reading paired-end data into GRangesList
>>> Date: Wed, 26 Oct 2011 23:51:47 +1000
>>>
>>> Hi
>>>
>>> I want to load paired-end data from Bam-Files into R in order to do
>>> expression counting. The complicating thing is, that the first read was
>>> aligned using a gapped alignment (i.e. the cigar string contains Ns).
>>>
>>> How can this be done? I thought this would be a quite common task but I
>>> did not find any function that would do this. Neither scanBam nor
>>> readBamGappedAlignments are directly helpful with this.
>>>
>>> For me the most obvious thing would be to hold the alignment of such a
>>> read as a GRangesList. In order to get there I would use scanBam to load the
>>> first read. Parse the  cigar string to identify the gaps, build a
>>> GRangesList and then add the alignment of the second read to the List. Do
>>> you have any better ideas?
>>>
>>>
>>> best regards,
>>> hubert
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>       
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>



More information about the Bioconductor mailing list