[BioC] wishlist for readGappedAlignments

Martin Morgan mtmorgan at fhcrc.org
Tue Aug 9 22:50:06 CEST 2011


On 08/09/2011 11:58 AM, Cory Barr wrote:
> I would find being able to pass the "what" component of a ScanBamParam
> object to readBamGappedAlignments very helpful.  Like Tengfei, I often read
> in a BAM file from readBamGappedAlignments and also scanBam then combine the
> information.

As a start, GappedAlignments() in 1.5.23 has a new ... argument used to 
populate elementMetadata. So e.g.,

bam <- scanBam(<...>)
with(bam[[1]], GappedAlignments(<...>, qual=qual))

Martin

> Being able to maintain information on a read's mate via
> readBamGappedAlignments would also get much use from me.  Currently, to do
> this I combine information from scanBam, parse out the end number from the
> BAM flag, and then regroup a GRangesList to include its mate.  Doing this
> efficiently by passing an argument to grglist would be great.
>
> -Cory
>
> On Tue, Aug 9, 2011 at 11:33 AM, Tengfei Yin<yintengfei at gmail.com>  wrote:
>
>> Dear all,
>>
>> I am using GenomicRanges and Rsamtools a lot for my work, they are
>> extremely
>> helpful and neat packages to deal with NGS data, thanks a lot for those
>> people how contribute to all those nice packages in BioC. I just have some
>> features request for the GappedAlignments, probably it's already there or
>> it's not a good practice to do it in certain way, please feel free to let
>> me
>> know.
>>
>> I like features from both scanBam or readBamGappedAlignments,  just
>> sometime
>> I need to write my own script trying to combine information from those two
>> function and make a "general" granges to work with. So I am wondering if
>> there is any way to do it in a neat way or is there a plan to implement
>> similiar features?
>>
>>    - Including more element meta data with GappedAlignments
>>       - there is "which" in readBamGappedAlignments, can I have some thing
>>       like "param" or "what" to get more info from bam file and associate
>> them
>>       with Gapped reads.
>>       - When doing the coerce from GappedAlignement to GRanges, or call
>>       granges() on GappedAlignments object, it only return the minimal
>>       information, "qwidth", "cigar", "ngap" is not included as
>> elementMetadata.
>>    - Including more pairing information for pair-end RNA-seq
>>       - So I could know the mated information with certain gapped reads,
>>       either plot it as pair-end read or do some computation on it.
>>       - Setting flags for each entry, so I can filter it out based on the
>>       flags, something like from scanBamFlag?
>>       - grglist to transform the data in different way
>>
>> If I can get a general data structure which combine all those information
>> and or features together, that would be nice, I realize it's hard  to
>> combine all information together and make it flexible at the same time ,
>>   e.g. you need to deal with how to binding element meta data for paired
>> entry, probably showing seq1/seq2 to indicate which sequence it's belongs
>> too? how to handle multiple hits?
>>
>> Right now, I am making my own "giant" GRanges object which including all
>> the
>> information I want, but that's too specific for my work, that's why I am
>> wondering if there is any  plan to combine those neat features together and
>> bring a more flexible data structure.
>>
>> Thanks!
>>
>> Tengfei
>>
>>
>> --
>> Tengfei Yin
>> MCDB PhD student
>> 1620 Howe Hall, 2274,
>> Iowa State University
>> Ames, IA,50011-2274
>> Homepage: www.tengfei.name
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list