[BioC] Filtering BAM files by start position for VariantTools

Valerie Obenchain vobencha at fhcrc.org
Thu Jul 11 19:22:12 CEST 2013


Sorry, the file here should be the filtered 'dest', not 'fl':

>      tally <- tallyVariants(fl, param)
>
>

Valerie

> Valerie
>
>
> On 07/09/2013 02:06 PM, Taylor, Sean D wrote:
>> I am trying to read a specific set of records from a bam file for use
>> in the VariantTools package. I'm trying to construct a which argument
>> (a GRanges object) that will pull in a set of records from all reads
>> that only start at a specified position. (i.e. all reads that start at
>> position 100). So far I have only been able to specify reads that
>> overlap position 100, but have not been able to find a way to define
>> the start site.
>>
>> #Example code:
>>> bams <- LungCancerLines::LungCancerBamFiles()
>>> bam <- bams$H1993
>>> which<-GRanges(seqnames=c('TP53'), IRanges(1110426, width=1), '+')
>>> tally.param <- VariantTallyParam(gmapR::TP53Genome(),
>> + readlen = 100L,
>> + high_base_quality = 23L,
>> + which = which)
>>> raw.variants <- tallyVariants(bam, tally.param)
>>
>> This code shows all the variants at position 1110426, but not all the
>> variants from the reads that start at position 1110426.
>>
>> Ultimately, I am trying to do this for all start positions in my data
>> set, so I would want something that looks like this pseudocode:
>>> raw.variants<-lapply (start(bam), function (x){
>>    which<-GRanges(seqnames=c('chrM'), '+', start=x)
>>    tally.param<-VariantTallyParam(gmap, readlen=100L, which=which)
>>    tallyVariants(bamfile, tally.param)
>> })
>>
>> Thanks,
>> Sean
>>
>>> sessionInfo()
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> LC_TIME=en_US.UTF-8
>>   [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
>> LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C                  LC_ADDRESS=C
>> [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8
>> LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] grid      parallel  stats     graphics  grDevices utils
>> datasets  methods   base
>>
>> other attached packages:
>> [1] RSQLite_0.11.4          DBI_0.2-7               BiocInstaller_1.10.2
>>   [4] LungCancerLines_0.0.8   GenomicFeatures_1.12.1
>> AnnotationDbi_1.22.5
>>   [7] Biobase_2.20.0          gmapR_1.2.0             latticeExtra_0.6-24
>> [10] lattice_0.20-15         RColorBrewer_1.0-5      genoPlotR_0.8
>> [13] ade4_1.5-2              VariantTools_1.2.2
>> VariantAnnotation_1.6.6
>> [16] Rsamtools_1.12.3        Biostrings_2.28.0       GenomicRanges_1.12.4
>> [19] IRanges_1.18.1          BiocGenerics_0.6.0
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.16.0                          bitops_1.0-5
>>   [3] BSgenome_1.28.0
>> BSgenome.Hsapiens.UCSC.hg19_1.3.19
>>   [5] graph_1.38.2                            Matrix_1.0-12
>>   [7] org.Hs.eg.db_2.9.0                      RBGL_1.36.2
>>   [9] RCurl_1.95-4.1                          rtracklayer_1.20.2
>> [11] stats4_3.0.1                            tools_3.0.1
>> [13] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2 XML_3.96-1.1
>> [15] zlibbioc_1.6.0
>>
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list