[BioC] rsamtools scanBcf segfault

Martin Morgan mtmorgan at fhcrc.org
Thu Jul 28 15:36:30 CEST 2011


On 07/28/2011 12:08 AM, Rob Syme wrote:
>> Thanks Rob for the nice reproducible example. The problem is when you open
>> the BcfFile; 'mode' should be 'rb' to indicate that you want to read a
>> binary file. Or leave it unspecified and the heuristic will do the right
>> thing.

Thanks that'll be corrected.

>
> Thanks Martin, the manual at
> http://www.bioconductor.org/packages/2.8/bioc/manuals/Rsamtools/man/Rsamtools.pdf
> includes a minor error that threw me off. The "Usage" section is
> correct (and I should have read this more closely), but in the
> "Arguments" section, the mode argument is described as:
> "A character(1) vector; mode="rw" indicates a binary (BCF) file,
> mode="r" a text (VCF) file"
> when it should probably be
> "A character(1) vector; mode="rb" indicates a binary (BCF) file,
> mode="r" a text (VCF) file"
>
>
> I have another error - and I've had a closer read of the manual this time ;)
> I have generated a filtered vcf using freebayes, but the following
> commands throw an error:
>
> wget http://gist.github.com/raw/1111101/f8629dfa20b15893684c78a586f321dbc65c6b6c/subset.vcf
> grep -v "^#" subset.vcf | cut -f 1 | sort -u>  subset.dict
> bcftools view -bSD subset.dict subset.vcf>  subset.bcf
> bcftools index subset.bcf
> wget http://gist.github.com/raw/1111101/5684d61541b39fb8c02e765d5f7891d9d122244e/scanBcf_error.R
> Rscript scanBcf_error.R
>
> # This gives the error:
> Error: scanBcf: failed to find fmt encoded as '21057'
>    path:<<redacted>>/subset.bcf
> Execution halted

The problem is that the current implementation isn't really general, it 
implements the formats supported (computed on, rather than just echoed) 
by bcftools -- PL, DP, GQ, SP, GT, GL -- whereas vcf is really much more 
flexible than that. This is cryptically documented on ?scanBcf, but I'd 
like to make this more flexible and will work on this.

> # Calling scanBcfHeader also gives an error:
> scanBcfHeader(bcf)
> Error in mapply(f, ..., SIMPLIFY = FALSE) :
>    'names' attribute [5] must be the same length as the vector [4]

This has been addressed previously in the devel branch.

Thanks Rob for the reports.

Martin

>
>
> I can't see anything wrong with the VCF, am I missing something obvious again?


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list