[BioC] Best way to capture both NCBI and UCSC labeled alignments?

Martin Morgan mtmorgan at fhcrc.org
Thu Dec 19 01:22:12 CET 2013


On 12/18/2013 03:04 PM, chris warth wrote:
> I am using readGAlignmentsFromBam() to extract alignments that overlap a
> set of genomic ranges.   The ranges include seqnames derived from UCSC
> nomenclature, eg chr1, chr2, chrY, etc.
>
> However, some of my BAM files (from TCGA) use NCBI nomenclature for their
> chromosomes, eg 1, 2, Y, etc.   When I try to extract alignments from these
> files I get an error message,
>
>
> readGAlignmentsFromBam(bamfile, param=param)
> Error in value[[3L]](cond) : 'scanBam' failed:
>    record: 0
>    error: 0
>    file: /home/TCGA/LAML/RNA-seq/TCGA-AB-2847-03A-01T-0736-13_rnaseq.bam
>    index: /home/TCGA/LAML/RNA-seq/TCGA-AB-2847-03A-01T-0736-13_rnaseq.bam
> In addition: Warning message:
> In doTryCatch(return(expr), name, parentenv, handler) :
>    space 'chrY' not in BAM header
>
>
> I am handling this by wrapping the call to readGAlignmentsFromBam() in a
> try-catch.  If an error is caught I directly modifying the seqnames in the
> genomic ranges before trying the call to readGAlignmentsFromBam() again.
> This seems highly kludgy.
>
> Is there any way to allow looser matching of seqnames when extracting
> alignments?  Is there a better way to handle this situation?

no looser matching, but

 
seqlevels(BamFile("/home/TCGA/LAML/RNA-seq/TCGA-AB-2847-03A-01T-0736-13_rnaseq.bam"))

tells you the levels in the bam file so you don't have to catch errors.

Martin

>
> Thanks in advance,
>
> -csw
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list