[BioC] pulling out specific lines from a bam file

Martin Morgan mtmorgan at fhcrc.org
Sun Jun 30 05:28:58 CEST 2013


On 06/29/2013 05:29 PM, Eric Foss [guest] wrote:
>
> I would like to enter a list of SNPs and a bam file and have Bioconductor return all the lines from the bam file, in human readable form, that contain the SNP in question. Here is my attempt:
>
> [code]library(GenomicRanges)
>
> genes <- GRanges(seqnames = c("13", "1", "12"),
> 				 ranges = IRanges(
> 				 	start = c(111942495, 114516752, 116434901),
> 				 	width = 1),
> 				 strand = rep("*", 3),
> 				 seqlengths = c("1" = 249250621, "13" = 115169878, "12" = 1

33851895))
>
> alnFile <- "first_10k_PASRTP_RNA_790790_exome6500SNPtolerant_062913_1_clean_sorted.bam"
>
> aln <- readGappedAlignments(alnFile)

here you can specify what you'd like to read in, along the lines of

   param = ScanBamParam(what="seq")
   readGappedAlignments(alnFile, param=param)

see scanBamWhat() for other things one might read in.

Martin

>
> bamGRanges <- GRanges(seqnames = seqnames(aln),
> 				 ranges = ranges(aln),
> 				 strand = strand(aln),
> 				 seqlengths = seqlengths(aln))
>
> answers <- findOverlaps(query = genes, subject = bamGRanges)
>
> final <- aln[subjectHits(answers)]
> [/code]
>
> my "final" variable consists of almost what I want, but not quite. It has only some of the fields from the bam file. It doesn't, for example, have the sequence. Can someone point me in the right direction?
>
> Thank you.
>
> Eric
>
>
>   -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Rsamtools_1.12.3     Biostrings_2.28.0    GenomicRanges_1.12.4 IRanges_1.18.1
> [5] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-5   stats4_3.0.1   tools_3.0.1    zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list