[BioC] How to access custom SAM tags (Rsamtools?)

Martin Morgan mtmorgan at fhcrc.org
Thu Nov 29 15:32:48 CET 2012


On 11/29/2012 06:23 AM, Kemal Akman wrote:
> Hello,
>
> I'm interested in accessing extended SAM tags in aligned short read
> sequence files using a R/Bioconductor library. Rsamtools seems
> potentially suited for this purpose, but I couldn't find the arguments
> to return custom SAM tags, such as "XX:Z", but there doesn't seem to
> be a corresponding "what" value in ScanBamParam()?

Hi,

There is a 'tag' argument to ScanBamParam; it's illustrated on the help page 
?ScanBamParam and in

   example(ScanBamParam)

which leads in part to...

ScnBmP> ## tags; NM: edit distance; H1: 1-difference hits
ScnBmP> p4 <- ScanBamParam(tag=c("NM", "H1"), what="flag")

ScnBmP> bam4 <- scanBam(fl, param=p4)

ScnBmP> str(bam4[[1]][["tag"]])
List of 2
  $ NM: int [1:3307] 0 0 0 5 0 1 0 1 0 1 ...
  $ H1: int [1:3307] 0 0 0 0 0 1 0 1 0 0 ...

It's often convenient to readBamGappedAlignments with extra fields, so

   head(readBamGappedAlignments(fl, param=p4))

gives

 > head(readBamGappedAlignments(fl, param=p4))
GappedAlignments with 6 alignments and 3 metadata columns:
       seqnames strand       cigar    qwidth     start       end     width
          <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
   [1]     seq1      +         36M        36         1        36        36
   [2]     seq1      +         35M        35         3        37        35
   [3]     seq1      +         35M        35         5        39        35
   [4]     seq1      +         36M        36         6        41        36
   [5]     seq1      +         35M        35         9        43        35
   [6]     seq1      +         35M        35        13        47        35
            ngap |      flag        NM        H1
       <integer> | <integer> <integer> <integer>
   [1]         0 |        73         0         0
   [2]         0 |        73         0         0
   [3]         0 |       137         0         0
   [4]         0 |       137         5         0
   [5]         0 |       137         0         0
   [6]         0 |        73         1         1
   ---
   seqlengths:
    seq1 seq2
    1575 1584

Martin

>
> I'd be especially interested in such a feature to parse methylation
> strings from Bismark, as well as custom SAM tags from other tools.
>
> Any suggestions on Rsamtools or alternative methods to achieve this in
> Bioconductor would be much appreciated.
>
> Example SAM data:
> $ head -2 sample.sam
> SRR306424.2547_PRESLEY:4:4:62:558_length=76     16      chr22
> 30675421        255     36M     *       0       0
> CACACACATCCACATAACACCATAACCAACCCCCGA
> ;,?A9:>B at B?@BBAC at C/BBC at CBCCC8BCBACBB    NM:i:4  XX:Z:15GG6G11G
> XM:Z:...............hh......h..........Zx       XR:Z:CT XG:Z:GA
> SRR306424.5227_PRESLEY:4:4:113:1768_length=76   0       chr5
> 123101409       255     75M     *       0       0
> TGATTTTTATATAAGGTGTAAGTAAGAGGTTTAGTTTTGATTTTTTGTATATGGATAATTAGTTTTTTTAGTATT
>      B>?@BBABBCBCCBCB>B at B@B?B at B@BB>BB at B?BBBB at BCBBABBAABABB@@B??ABAA>BABBBA=><@A=
>      NM:i:13 XX:Z:22C8C12C4C8CC4C1CCC2C1CC
> XM:Z:......................h........x............x....h........hx....h.hhx..h.hh
>         XR:Z:CT XG:Z:CT
>
>
> Best regards,
>
> Kemal Akman
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list