[BioC] seqlevels in VCF objects

Murat Tasan mmuurr at gmail.com
Tue Jan 15 00:40:46 CET 2013


hi all - i've encountered an annoying problem, and i'd like to avoid
read/writing the many GBs required for the blunt-force solution...

the 1000 Genomes project provides a collection of VCF files providing
the genotypes for all found variants.
after reading in the VCF files (via vcf <- readVcf(...)), i have an
VCF object, but the info(vcf) object reveals the chromosome names
(i.e. 'seqlevels') are "1", "2", ..., "X", "Y".
Bioconductor's TxDb.Hsapiens.UCSC.hg19.knownGene object, however, uses
the UCSC standard prefix for chromosome names: "chr1", "chr2", etc.

in trying to subset(...) or predictCoding(...) the VCF data against
the genome objects (including BSgenome.Hsapiens.UCSC.hg19) this causes
an obvious failure.

i tried re-setting the seqlevels of the VCF 'info' object like so
(thinking the seqnames factor just indexes back on the seqlevels as a
key):

seqlevels(vcf at info) <- sprintf("chr%s", seqlevels(vcf at info))

but this doesn't seem to have any effect.

any idea on how to make this bulk change of seqnames for data in VCF objects?

cheers,

-m



More information about the Bioconductor mailing list