[BioC] predictCoding() return empty Granges

sun fireflysrb at gmail.com
Fri Oct 5 20:31:01 CEST 2012


Hi Valerie,

I did make VCF object and TranscriptDb with same chromosome name in my last
test. I am still in trouble when I use locateVariants() and predictCoding()
in newest released Bioconductor 2.11. I can use locateVariants()
successfully in Bioconductor 2.10. Due to I can not adjust chr name of txdb
to match the chr name of BSgenome, so I can not run  predictCoding() in
Bioconductor 2.10 for my Arabidopsis TAIR10 dataset.

I attached an small vcf file "var.subset.vcf"  and the Arabi.sqlite to you,
which I used in my test. The following is the script I used in Bioconductor
2.11.

library(VariantAnnotation)
library(GenomicFeatures)

vcf <- readVcf("var.subset.vcf","TAIR10")
######################################################
#### Arabi.sqlite was generated by makeTranscriptDbFromGFF().
#### library(GenomicFeatures)
####  ArabiTranDB <- makeTranscriptDbFromGFF(file="TAIR10.gff3",
####                format="gff3",
####                dataSource="TAIR 10",
####                species="Arabodpsis")

####  saveDb(ArabiTranDB,file="Arabi.sqlite")
######################################################

txdb <-loadDb("Arabi.sqlite")
intersect(seqlevels(vcf), seqlevels(txdb))
[1] "Chr1"

allvar <- locateVariants(vcf, txdb, AllVariants())
*Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function "locateVariants", for
signature "GRanges", "GRanges", "PromoterVariants"

*rd <- rowData(vcf)*
*allvar <- locateVariants(rd, txdb, AllVariants())*
Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function "locateVariants", for
signature "GRanges", "GRanges", "PromoterVariants"

*### try to find coding SNPs*
*## Get TAIR BSgenome. This BSgenome is for TAIR9, but TAIR10 is the same
assembly as TAIR9, see Hervé's reply.
library(BSgenome.Athaliana.TAIR.04232008)

## adjust chr name of txdb to match chr name of BSgenome
seqlevels(txdb) <- sub("^C", "c", seqlevels(txdb))
## adjust chr name of vcf to match chr name of BSgenome
vcf <- renameSeqlevels(vcf, c("Chr1"="chr1"))
## confirm if the chr name mached
intersect(seqlevels(vcf), seqlevels(txdb))
[1] "chr1"
coding1 <- predictCoding(vcf, txdb, Athaliana)
> coding1

GRanges with 0 ranges and 1 metadata column:
   seqnames    ranges strand |      GENEID
      <Rle> <IRanges>  <Rle> | <character>
  ---
  seqlengths:

*In this test, I removed variants in ChrM to get a subset of vcf, that is
why I didn't get the  *Warning message: In .setActiveSubjectSeq(query,
subject) : circular sequence(s) in query 'chrM' ignored* that I got in my
last test. However, predictCoding() still returned an empty GRanges object
to me.

*
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] BSgenome.Athaliana.TAIR.04232008_1.3.18
BSgenome_1.26.0
GenomicFeatures_1.10.0
AnnotationDbi_1.20.0
 [5] Biobase_2.18.0
VariantAnnotation_1.4.0
Rsamtools_1.10.1
Biostrings_2.26.0
 [9] GenomicRanges_1.10.0
IRanges_1.16.2
BiocGenerics_0.4.0
vimcom_0.9-2
[13] setwidth_1.0-0
colorout_0.9-9

loaded via a namespace (and not attached):
 [1] DBI_0.2-5          RCurl_1.95-0.1.2   RSQLite_0.11.2
XML_3.95-0.1       biomaRt_2.14.0     bitops_1.0-4.1     parallel_2.15.1
rtracklayer_1.18.0
 [9] stats4_2.15.1      tools_2.15.1       zlibbioc_1.4.0    *


*Thanks for the suggestion.

Rebecca


On Thu, Oct 4, 2012 at 5:01 PM, Valerie Obenchain <vobencha at fhcrc.org>wrote:

> **
> Hi Sun,
>
>
>
> On 10/04/2012 02:44 PM, sun wrote:
>
> After I successfully adjusted the chromosomes of VCF and TranscriptDb
> objects to match the names of the BSgenome object, I ran predictCoding(),
> it returned nothing except a warning message, see below,
>
>
>  coding <- predictCoding(vcf, txdb, seqSource=Athaliana)
>
>  *Warning message:
> In .setActiveSubjectSeq(query, subject) :
>   circular sequence(s) in query 'chrM' ignored*
>
>  coding
>
>  GRanges with 0 ranges and 1 metadata column:
>    seqnames    ranges strand |      GENEID
>       <Rle> <IRanges>  <Rle> | <character>
>   ---
>   seqlengths:
>
> Here are the values of  VCF, TranscriptDb and BSgenome Object,
>
>
>  vcf
>
>  class: VCF
> dim: 551201 2
> genome: TAIR10
> exptData(1): header
> fixed(4): REF ALT QUAL FILTER
> info(18): DP DP4 ... PR VDB
> geno(6): GT GQ ... SP PL
> rownames(551201): Chr1:711 Chr1:956 ... ChrM:366919 ChrM:366920  *Qustion:
> After used renameSeqlevels() to change chr name of vcf (Chr to chr), the
> rownames here are still old chr name. not sure if this will be the problem
> for predictCoding(). *
>
>
> Yes, having different chromosome names in the VCF file and the txdb will
> be a problem. Any time we want to match / find overlaps in ranges it is
> important that the seqnames (i.e., chromosomes) match. There is an example
> of how to use renameSeqlevels() on a VCF file on the predictCoding() man
> page.
>
> library(VariantAnnotation)
> ?predictCoding
>
> ## Read variants from a VCF file
> fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
> vcf <- readVcf(fl, "hg19")
>
> ## Rename seqlevels in the VCF object to match those in the TxDb
> vcf<- renameSeqlevels(vcf, c("22"="chr22"))
> ## Confirm common seqlevels
> intersect(seqlevels(vcf), seqlevels(txdb))
>
> Be sure to confirm that you have common seqlevels between your VCF and
> annotation. If you are having trouble please send a small reproducible
> example.
>
> Valerie
>
>
>  rowData values names(1): paramRangeID
> colnames(2): sample1_aln.sorted.bam sample2_aln.sorted.bam
> colData names(1): Samples
>
>
>  txdb
>
>  TranscriptDb object:
> | Db type: TranscriptDb
> | Supporting package: GenomicFeatures
> | Data source: TAIR 10
> | Genus and Species: Arabodpsis
> | miRBase build ID: NA
> | transcript_nrow: 35386
> | exon_nrow: 207465
> | cds_nrow: 197160
> | Db created by: GenomicFeatures package from Bioconductor
> | Creation time: 2012-09-28 16:43:28 -0700 (Fri, 28 Sep 2012)
> | GenomicFeatures version at creation time: 1.9.37
> | RSQLite version at creation time: 0.11.1
> | DBSCHEMAVERSION: 1.0
>
>
>  Athaliana
>
>  Arabidopsis genome
> |
> | organism: Arabidopsis thaliana (Arabidopsis)
> | provider: TAIR
> | provider version: 04232008
> | release date: NA
> | release name: dumped from ADB: Mar/14/08
> |
> | sequences (see '?seqnames'):
> |   chr1  chr2  chr3  chr4  chr5  chrC  chrM
> |
> | (use the '$' or '[[' operator to access a given sequence)
>
>
>  sessionInfo()
>
>  R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] BSgenome.Athaliana.TAIR.04232008_1.3.18
> BSgenome_1.26.0
> VariantAnnotation_1.4.0
>  [4] Rsamtools_1.10.1
> Biostrings_2.26.0
> GenomicFeatures_1.10.0
>  [7] AnnotationDbi_1.20.0
> Biobase_2.18.0
> GenomicRanges_1.10.0
> [10] IRanges_1.16.2
> BiocGenerics_0.4.0
> vimcom_0.9-2
> [13] setwidth_1.0-0
> colorout_0.9-9
>
> loaded via a namespace (and not attached):
>  [1] DBI_0.2-5          RCurl_1.95-0.1.2   RSQLite_0.11.2
> XML_3.95-0.1       biomaRt_2.14.0     bitops_1.0-4.1
>  [7] parallel_2.15.1    rtracklayer_1.18.0 stats4_2.15.1
> tools_2.15.1       zlibbioc_1.4.0
>
> any ideas? thank you.
>
> Rebecca
>
>
> On Thu, Oct 4, 2012 at 1:18 PM, Hervé Pagès <hpages at fhcrc.org> <hpages at fhcrc.org> wrote:
>
>
>  Hi Rebecca,
>
>
> On 10/04/2012 12:10 PM, sun wrote:
>
>
>  Hi All,
>
> I am going to use "coding <- predictCoding(vcf, txdb,
> seqSource=Athaliana)"
> to detect coding SNPs. The problem is that the chromosome names are not
> consistent among VCF, txdb and BSgenome. In vcf, the chromosome name is
> "Chr*", in txdb, the chr name is "Chr", but in BSgenome, the chr name is
> "chr*" .
>
> I know I can use renameSeqlevels() to adjust the seqlevels (chromosome
> names) of the VCF object to match that of the txdb annotation. But how can
> I adjust the chr name of BSgenome or TranscriptDB?
>
>
>  In BioC 2.11 (released yesterday), you can rename the chromosomes of a
> TranscriptDb object, so you could rename the chromosomes of your
> VCF and TranscriptDb objects to match the names of the BSgenome object.
>
> E.g. for the TranscriptDb object:
>
>   seqlevels(txdb) <- sub("^c", "C", seqlevels(txdb))
>
> Note that renaming the chromosomes of a TranscriptDb object is a new
> feature and is not fully implemented yet. For example, if you use
> select() on the object you'll still get the original names (those
> stored in the db), and if you try to specify a chromosome name thru
> the 'vals' arg of the transcripts(), exons() and cds() extractors,
> you still need to use the original names. This will be addressed soon.
>
> Our plan is to also support renaming of the chromosomes of BSgenome
> and SNPlocs objects very soon.
>
> Also, an additional level of convenience will be provided via the
> seqnameStyle() getter and setter, so you'll be able to quickly rename
> with something like:
>
>   seqnameStyle(x) <- "UCSC"
>
> or
>
>   seqnameStyle(vcf) <- seqnameStyle(txdb) <- seqnameStyle(genome)
>
> This will work on almost any 'x' object that contains chromosome
> names (GRanges, GRangesList, GappedAlignments, TranscriptDb, VCF,
> BSgenome, SNPlocs, etc...)
>
> Cheers,
> H.
>
>
>
>
>
>  Thanks,
>
> Rebecca
>
>         [[alternative HTML version deleted]]
>
> ______________________________**_________________
> Bioconductor mailing listBioconductor at r-project.orghttps://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives: http://news.gmane.org/gmane.**
> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>  --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
>  	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing listBioconductor at r-project.orghttps://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>


More information about the Bioconductor mailing list