[BioC] Amplicon and exon level read counts and GC content

Yu Chuan Tai yuchuan at stat.berkeley.edu
Wed May 30 21:27:45 CEST 2012


Hi Wei,

Thanks again for your clarifications! I think it would be great if your 
package can support BAM files in the future, since the file size is 
smaller.

Best,
Yu Chuan

On Wed, 30 May 2012, Wei Shi wrote:

> You may use the following command to convert BAM to SAM if you got Samtools installed on a unix computer:
>
> samtools view aligned.BAM > aligned.SAM
>
> It will be nice if Rsamtools can support this conversion as Samtools supports this conversion. The reason why featureCounts in Rsubread does not support BAM files is because we run this function straight after running the align() function for the alignment. But I think we may need to add support for this ...
>
> Wei
>
> On May 30, 2012, at 4:18 PM, Martin Morgan wrote:
>
>> On 05/29/2012 11:12 PM, Yu Chuan Tai wrote:
>>> Hi Martin,
>>>
>>> Thanks for your quick reply and great help! I will try what you
>>> suggested below and let you know if I still have any question. A quick
>>> question. Can RSamTools convert BAM to SAM?
>>
>> No; it will go the other way around (SAM --> BAM, see ?asBam) which is generally the right thing to do (smaller files, more easily 'queried' for regions of interest, etc, especially after indexing ?indexBam). Martin
>>
>>>
>>> Best,
>>> Yu Chuan
>>>
>>> On Tue, 29 May 2012, Martin Morgan wrote:
>>>
>>>> On 05/29/2012 10:44 PM, Martin Morgan wrote:
>>>>> On 05/29/2012 10:17 PM, Yu Chuan Tai wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have some questions about DNA-Seq and RNA-Seq analyses. In Amplicon
>>>>>> sequencing, is there any package/function which can extract
>>>>>> amplicon-level read counts and GC content from an aligned file of BAM
>>>>>> format? The same question to exon-level read counts and GC content.
>>>>>> More
>>>>>> generally, given a genomic interval, how could I extract the read count
>>>>>> and GC content for that interval?
>>>>>
>>>>> The Rsamtools package has scanBam for read input, also
>>>>> GenomicRanges::readGappedAlignments and GenomicRanges::summarizeOverlaps
>>>>> for higher-level read counting. The requirement is generally a
>>>>> 'GRanges', which can be queried from TxDb packages (e.g.,
>>>>> TxDb.Hsapiens.UCSC.hg19.knownGene) or from gff (via rtracklayer) or many
>>>>> other sources. There are vignettes in GenomicRanges, GenomicFeatures,
>>>>> rtracklayer, and Rsamtools packages; see
>>>>>
>>>>> http://bioconductor.org/packages/release/BiocViews.html#___Software
>>>>
>>>> More specifically, after
>>>>
>>>> library(Rsamtools)
>>>> example(scanBam) # defines 'fl', a path to a bam file
>>>>
>>>> for a _single_ genomic range
>>>>
>>>> param = ScanBamParam(what="seq",
>>>> which=GRanges("seq1", IRanges(100, 500)))
>>>> dna = scanBam(fl, param=param)[[1]][["seq"]]
>>>> length(dna) # 365 reads overlap region
>>>> alphabetFrequency(dna, collapse=TRUE, baseOnly=TRUE) # 2838 + 3003 GC
>>>>
>>>> though you'd likely want to specify several regions (vector arguments
>>>> to GRanges) and think about flags (scanBamFlag() and the flag argument
>>>> to ScanBamParam), read mapping quality, reads overlapping more than
>>>> one region, etc. (summarizeOverlaps implements several counting
>>>> strategies, but it is 'easy' to implement arbitrary approaches).
>>>>
>>>>>
>>>>> Martin
>>>>>
>>>>>>
>>>>>> Thanks for any input!
>>>>>>
>>>>>> Best,
>>>>>> Yu Chuan
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Computational Biology
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>>>
>>>> Location: M1-B861
>>>> Telephone: 206 667-2793
>>>>
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:5}}



More information about the Bioconductor mailing list