[BioC] chromosome name match among vcf, txdb,BSgenome

Hervé Pagès hpages at fhcrc.org
Thu Oct 4 22:18:59 CEST 2012


Hi Rebecca,

On 10/04/2012 12:10 PM, sun wrote:
> Hi All,
>
> I am going to use "coding <- predictCoding(vcf, txdb, seqSource=Athaliana)"
> to detect coding SNPs. The problem is that the chromosome names are not
> consistent among VCF, txdb and BSgenome. In vcf, the chromosome name is
> "Chr*", in txdb, the chr name is "Chr", but in BSgenome, the chr name is
> "chr*" .
>
> I know I can use renameSeqlevels() to adjust the seqlevels (chromosome
> names) of the VCF object to match that of the txdb annotation. But how can
> I adjust the chr name of BSgenome or TranscriptDB?

In BioC 2.11 (released yesterday), you can rename the chromosomes of a
TranscriptDb object, so you could rename the chromosomes of your
VCF and TranscriptDb objects to match the names of the BSgenome object.

E.g. for the TranscriptDb object:

   seqlevels(txdb) <- sub("^c", "C", seqlevels(txdb))

Note that renaming the chromosomes of a TranscriptDb object is a new
feature and is not fully implemented yet. For example, if you use
select() on the object you'll still get the original names (those
stored in the db), and if you try to specify a chromosome name thru
the 'vals' arg of the transcripts(), exons() and cds() extractors,
you still need to use the original names. This will be addressed soon.

Our plan is to also support renaming of the chromosomes of BSgenome
and SNPlocs objects very soon.

Also, an additional level of convenience will be provided via the
seqnameStyle() getter and setter, so you'll be able to quickly rename
with something like:

   seqnameStyle(x) <- "UCSC"

or

   seqnameStyle(vcf) <- seqnameStyle(txdb) <- seqnameStyle(genome)

This will work on almost any 'x' object that contains chromosome
names (GRanges, GRangesList, GappedAlignments, TranscriptDb, VCF,
BSgenome, SNPlocs, etc...)

Cheers,
H.


>
> Thanks,
>
> Rebecca
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list