[BioC] seqlevels in VCF objects

Valerie Obenchain vobencha at fhcrc.org
Tue Jan 15 03:25:06 CET 2013


On 01/14/13 15:40, Murat Tasan wrote:
> hi all - i've encountered an annoying problem, and i'd like to avoid
> read/writing the many GBs required for the blunt-force solution...
>
> the 1000 Genomes project provides a collection of VCF files providing
> the genotypes for all found variants.
> after reading in the VCF files (via vcf<- readVcf(...)), i have an
> VCF object, but the info(vcf) object reveals the chromosome names
> (i.e. 'seqlevels') are "1", "2", ..., "X", "Y".
> Bioconductor's TxDb.Hsapiens.UCSC.hg19.knownGene object, however, uses
> the UCSC standard prefix for chromosome names: "chr1", "chr2", etc.
>
> in trying to subset(...) or predictCoding(...) the VCF data against
> the genome objects (including BSgenome.Hsapiens.UCSC.hg19) this causes
> an obvious failure.
>
> i tried re-setting the seqlevels of the VCF 'info' object like so
> (thinking the seqnames factor just indexes back on the seqlevels as a
> key):
>
> seqlevels(vcf at info)<- sprintf("chr%s", seqlevels(vcf at info))
>
> but this doesn't seem to have any effect.
>
> any idea on how to make this bulk change of seqnames for data in VCF objects?


There are examples of  how to do this in the vignette and on the ?VCF 
and ?predictCoding man pages.
In ?VCF there is a section in the examples dedicated to "Renaming 
seqlevels and subsetting".
Renaming all seqlevels in a VCF can be done with renameSeqlevels() or 
seqlevels() and subsetting a VCF on
seqlevels can be done with keepSeqlevels().

 >  fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
 >  vcf <- readVcf(fl, "hg19")
 > seqlevels(vcf)
[1] "20"
 > seqlevels(vcf) <- "chr20"
 > seqlevels(vcf)
[1] "chr20"


Valerie

>
> cheers,
>
> -m
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list