[BioC] Rsamtools problem

Martin Morgan mtmorgan at fhcrc.org
Tue Mar 29 15:48:23 CEST 2011


On 03/29/2011 03:27 AM, Vincenzo Capece wrote:
> dear all,
> i am a beginner and i need an information about bam header.
> The header of my bam file is:
>
> @SQ     SN:chr1 LN:249250621
> @SQ     SN:chr2 LN:243199373
> @SQ     SN:chr3 LN:198022430
> @SQ     SN:chr4 LN:191154276
> @SQ     SN:chr5 LN:180915260
> @SQ     SN:chr6 LN:171115067
> @SQ     SN:chr7 LN:159138663
> @SQ     SN:chrX LN:155270560
> @SQ     SN:chr8 LN:146364022
> @SQ     SN:chr9 LN:141213431
> @SQ     SN:chr10        LN:135534747
> @SQ     SN:chr11        LN:135006516
> @SQ     SN:chr12        LN:133851895
> @SQ     SN:chr13        LN:115169878
> @SQ     SN:chr14        LN:107349540
> @SQ     SN:chr15        LN:102531392
> @SQ     SN:chr16        LN:90354753
> @SQ     SN:chr17        LN:81195210
> @SQ     SN:chr18        LN:78077248
> @SQ     SN:chr20        LN:63025520
> @SQ     SN:chrY LN:59373566
> @SQ     SN:chr19        LN:59128983
> @SQ     SN:chr22        LN:51304566
> @SQ     SN:chr21        LN:48129895
> @SQ     SN:chr6_ssto_hap7       LN:4928567
> @SQ     SN:chr6_mcf_hap5        LN:4833398
> @SQ     SN:chr6_cox_hap2        LN:4795371
> @SQ     SN:chr6_mann_hap4       LN:4683263
> @SQ     SN:chr6_apd_hap1        LN:4622290
> @SQ     SN:chr6_qbl_hap6        LN:4611984
> @SQ     SN:chr6_dbb_hap3        LN:4610396
> @SQ     SN:chr17_ctg5_hap1      LN:1680828
> @SQ     SN:chr4_ctg9_hap1       LN:590426
> @SQ     SN:chr1_gl000192_random LN:547496
> @SQ     SN:chrUn_gl000225       LN:211173
> @SQ     SN:chr4_gl000194_random LN:191469
> @SQ     SN:chr4_gl000193_random LN:189789
> @SQ     SN:chr9_gl000200_random LN:187035
> @SQ     SN:chrUn_gl000222       LN:186861
> @SQ     SN:chrUn_gl000212       LN:186858
> @SQ     SN:chr7_gl000195_random LN:182896
> @SQ     SN:chrUn_gl000223       LN:180455
> @SQ     SN:chrUn_gl000224       LN:179693
> @SQ     SN:chrUn_gl000219       LN:179198
> @SQ     SN:chr17_gl000205_random        LN:174588
> @SQ     SN:chrUn_gl000215       LN:172545
> @SQ     SN:chrUn_gl000216       LN:172294
> @SQ     SN:chrUn_gl000217       LN:172149
> @SQ     SN:chr9_gl000199_random LN:169874
> @SQ     SN:chrUn_gl000211       LN:166566
> @SQ     SN:chrUn_gl000213       LN:164239
> @SQ     SN:chrUn_gl000220       LN:161802
> @SQ     SN:chrUn_gl000218       LN:161147
> @SQ     SN:chr19_gl000209_random        LN:159169
> @SQ     SN:chrUn_gl000221       LN:155397
> @SQ     SN:chrUn_gl000214       LN:137718
> @SQ     SN:chrUn_gl000228       LN:129120
> @SQ     SN:chrUn_gl000227       LN:128374
> @SQ     SN:chr1_gl000191_random LN:106433
> @SQ     SN:chr19_gl000208_random        LN:92689
> @SQ     SN:chr9_gl000198_random LN:90085
> @SQ     SN:chr17_gl000204_random        LN:81310
> @SQ     SN:chrUn_gl000233       LN:45941
> @SQ     SN:chrUn_gl000237       LN:45867
> @SQ     SN:chrUn_gl000230       LN:43691
> @SQ     SN:chrUn_gl000242       LN:43523
> @SQ     SN:chrUn_gl000243       LN:43341
> @SQ     SN:chrUn_gl000241       LN:42152
> @SQ     SN:chrUn_gl000236       LN:41934
> @SQ     SN:chrUn_gl000240       LN:41933
> @SQ     SN:chr17_gl000206_random        LN:41001
> @SQ     SN:chrUn_gl000232       LN:40652
> @SQ     SN:chrUn_gl000234       LN:40531
> @SQ     SN:chr11_gl000202_random        LN:40103
> @SQ     SN:chrUn_gl000238       LN:39939
> @SQ     SN:chrUn_gl000244       LN:39929
> @SQ     SN:chrUn_gl000248       LN:39786
> @SQ     SN:chr8_gl000196_random LN:38914
> @SQ     SN:chrUn_gl000249       LN:38502
> @SQ     SN:chrUn_gl000246       LN:38154
> @SQ     SN:chr17_gl000203_random        LN:37498
> @SQ     SN:chr8_gl000197_random LN:37175
> @SQ     SN:chrUn_gl000245       LN:36651
> @SQ     SN:chrUn_gl000247       LN:36422
> @SQ     SN:chr9_gl000201_random LN:36148
> @SQ     SN:chrUn_gl000235       LN:34474
> @SQ     SN:chrUn_gl000239       LN:33824
> @SQ     SN:chr21_gl000210_random        LN:27682
> @SQ     SN:chrUn_gl000231       LN:27386
> @SQ     SN:chrUn_gl000229       LN:19913
> @SQ     SN:chrM LN:16571
> @SQ     SN:chrUn_gl000226       LN:15008
> @SQ     SN:chr18_gl000207_random        LN:4262
> @PG     ID:bfast        VN:0.6.4e
>
> I want to use Dataframe function in this way:
> DataFrame with 6 rows and 5 columns
> rname strand pos qwidth
> <integer>  <integer>  <integer>  <integer>
> 1 1 1 970 35
> 2 1 1 971 35
> 3 1 1 972 35
> 4 1 1 973 35
> 5 1 1 974 35
> 6 1 2 975 35
> seq
> <DNAStringSet>
> 1 TATTAGGAAATGCTTTACTGTCATAACTATGAAGA
> 2 ATTAGGAAATGCTTTACTGTCATAACTATGAAGAG
> 3 TTAGGAAATGCTTTACTGTCATAACTATGAAGAGA
> 4 TAGGAAATGCTTTACTGTCATAACTATGAAGAGAC
> 5 AGGAAATGCTTTACTGTCATAACTATGAAGAGACT
> 6 GGAAATGCTTTACTGTCATAACTATGAAGAGACTA
>
> If i launch:
> df<- do.call("DataFrame", bam)
> my result is:
>
> DataFrame with 0 rows and 5 columns
>
> And my bam file is:
>
> $`chr1:1-249250621`
> $`chr1:1-249250621`$rname
> factor(0)
> 93 Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chrX chr8 chr9 chr10 ...
> chr18_gl000207_random
>
> $`chr1:1-249250621`$strand
> factor(0)
> Levels: + - *
>
> $`chr1:1-249250621`$pos
> integer(0)
>
> $`chr1:1-249250621`$qwidth
> integer(0)
>
> $`chr1:1-249250621`$seq
>    A DNAStringSet instance of length 0
>
>
> $`chr2:1-243199373`
> $`chr2:1-243199373`$rname
> factor(0)
> 93 Levels: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chrX chr8 chr9 chr10 ...
> chr18_gl000207_random
>
> $`chr2:1-243199373`$strand
> factor(0)
> Levels: + - *
>
> $`chr2:1-243199373`$pos
> integer(0)
>
> $`chr2:1-243199373`$qwidth
> integer(0)
>
> $`chr2:1-243199373`$seq
>    A DNAStringSet instance of length 0
>
> What is my error?
> Where in header BAM file i can find the chr lenghts?

Hi --

after example(scanBam), there is variable 'fl' pointing to a bam file. 
Let's use that.

   t = scanBamHeader(fl)[[1]][["targets"]]

provides a vector of sequences and lengths from the header file. These 
are just the values reported in the BAM file; they do not mean that 
reads from those sequences are present.

To find out how many reads are in the file, maybe

   param = ScanBamParam(which=GRanges(names(t), IRanges(1, t)))
   countBam(fl, param=param)

to query a particular chromosome for the reads, maybe

   chr = "seq2"
   param = ScanBamParam(
              what=c("rname", "strand", "pos", "qwidth", "seq"),
              which=GRanges(chr, IRanges(1, t[chr])))
   bam = scanBam(fl, param=param)
   DataFrame(bam[[1]])

Martin


>
> Thanks a lot.
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list