[BioC] BAM files to Genomic Ranges object

Hervé Pagès hpages at fhcrc.org
Wed Jan 9 23:00:09 CET 2013


Hi JL,

On 01/09/2013 06:08 AM, jluis.lavin at unavarra.es wrote:
> Hello everybody,
>
> I've been told about a very nice R package called Repitools. It has many
> interesting features but the use of some of them remains a little unclear
> to me yet.
> There are some subjects about this tool I'll have to ask about, in this
> list, but first of all I need to convert some BAM files into GRanges
> objects.
> To do so I read about BAM2GenomicRanges function. Reading the package
> documentation, I tried to convert 3 BAM files into GRanges. I used the
> following instructions:
>
> *(where "/path/to/ANALYSIS/folder/" is the directory where my BAM files are)
>
> bgr <- BAM2GRanges('/path/to/ANALYSIS/folder/',
> what = c("rname", "strand", "pos", "qwidth"),

oops, using the qwidth for getting the widths of the genomic ranges is
in general incorrect, unless one assumes that all the cigars are trivial
(Ms only). But that's really a strong assumption and it will almost
never be true, because AFAIK most aligners introduce indels by default
during the alignment process. Tophat in particular, is one of them.

Note that loading a BAM file into a GRanges object can easily be
done with readGappedAlignments() (from the GenomicRanges package),
followed by coercion of the returned GappedAlignments object to
a GRanges object:

   gal <- readGappedAlignments("/path/to/your/file.bam")
   as(gal, "GRanges")

It will infer the correct width of the genomic ranges based on the
cigar.

> flag = scanBamFlag(isUnmappedQuery = FALSE, isDuplicate = FALSE),
> verbose = TRUE)
>
> And I get this error message:
>
> Reading BAM file AW-07.
> [bam_header_read] bgzf_check_EOF: Invalid argument
> [bam_header_read] invalid BAM binary header (this is not a BAM file).
> Error in open.BamFile(BamFile(file, index), "rb") :
>    SAM/BAM header missing or empty
>    file: '/path/to/ANALYSIS/folder/'
>
> The BAM files I'm using here where produced by a tophat alignment and used
> in R previously for other analysis with this R code:
>
> library(Rsamtools)
> library(GenomicFeatures)
>
> bamlist=list()
> src_files=list.files(pattern="*.bam")
>
> for(filename in src_files)
> {
> 	tmp=readBamGappedAlignments(filename)
> 	bamlist[[length(bamlist)+1]]=GRanges(seqnames=rname(tmp),
> 	ranges=IRanges(start=start(tmp),end=end(tmp)),
> 	strand=rep("*",length(tmp)))
> }
>
>
> names(bamlist)=src_files

That code looks correct, but I would rather do:

   bam_files <- list.files(pattern="*.bam")
   gr_list <- lapply(bam_files,
                     function(bam_file)
                       as(readGappedAlignments(bam_file), "GRanges"))
   names(gr_list) <- bam_files

Cheers,
H.

>
> My question is, Why are now considered as invalid BAM files when they
> previously worked as correct BAM files? Am I mistaken about the meaning of
> the error message?Did I miss something in the code?
>
> Thanks in advance for your kind help.
>
> JL
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list