[BioC] predictCoding() return empty Granges

Hervé Pagès hpages at fhcrc.org
Fri Oct 5 21:56:42 CEST 2012


Hi Rebecca,

On 10/05/2012 11:31 AM, sun wrote:
> Hi Valerie,
>
> I did make VCF object and TranscriptDb with same chromosome name in my last
> test. I am still in trouble when I use locateVariants() and predictCoding()
> in newest released Bioconductor 2.11. I can use locateVariants()
> successfully in Bioconductor 2.10. Due to I can not adjust chr name of txdb
> to match the chr name of BSgenome, so I can not run  predictCoding() in
> Bioconductor 2.10 for my Arabidopsis TAIR10 dataset.
>
> I attached an small vcf file "var.subset.vcf"  and the Arabi.sqlite to you,
> which I used in my test. The following is the script I used in Bioconductor
> 2.11.
>
> library(VariantAnnotation)
> library(GenomicFeatures)
>
> vcf <- readVcf("var.subset.vcf","TAIR10")e
> ######################################################
> #### Arabi.sqlite was generated by makeTranscriptDbFromGFF().
> #### library(GenomicFeatures)
> ####  ArabiTranDB <- makeTranscriptDbFromGFF(file="TAIR10.gff3",
> ####                format="gff3",
> ####                dataSource="TAIR 10",
> ####                species="Arabodpsis")
>
> ####  saveDb(ArabiTranDB,file="Arabi.sqlite")
> ######################################################
>
> txdb <-loadDb("Arabi.sqlite")
> intersect(seqlevels(vcf), seqlevels(txdb))
> [1] "Chr1"
>
> allvar <- locateVariants(vcf, txdb, AllVariants())
> *Error in function (classes, fdef, mtable)  :
>    unable to find an inherited method for function "locateVariants", for
> signature "GRanges", "GRanges", "PromoterVariants"
>
> *rd <- rowData(vcf)*
> *allvar <- locateVariants(rd, txdb, AllVariants())*
> Error in function (classes, fdef, mtable)  :
>    unable to find an inherited method for function "locateVariants", for
> signature "GRanges", "GRanges", "PromoterVariants"
>
> *### try to find coding SNPs*
> *## Get TAIR BSgenome. This BSgenome is for TAIR9, but TAIR10 is the same
> assembly as TAIR9, see Hervé's reply.
> library(BSgenome.Athaliana.TAIR.04232008)

FWIW, you're still using the wrong BSgenome package. Please re-read
carefully my 2 previous emails on this.

Thanks,
H.

>
> ## adjust chr name of txdb to match chr name of BSgenome
> seqlevels(txdb) <- sub("^C", "c", seqlevels(txdb))
> ## adjust chr name of vcf to match chr name of BSgenome
> vcf <- renameSeqlevels(vcf, c("Chr1"="chr1"))
> ## confirm if the chr name mached
> intersect(seqlevels(vcf), seqlevels(txdb))
> [1] "chr1"
> coding1 <- predictCoding(vcf, txdb, Athaliana)
>> coding1
>
> GRanges with 0 ranges and 1 metadata column:
>     seqnames    ranges strand |      GENEID
>        <Rle> <IRanges>  <Rle> | <character>
>    ---
>    seqlengths:
>
> *In this test, I removed variants in ChrM to get a subset of vcf, that is
> why I didn't get the  *Warning message: In .setActiveSubjectSeq(query,
> subject) : circular sequence(s) in query 'chrM' ignored* that I got in my
> last test. However, predictCoding() still returned an empty GRanges object
> to me.
>
> *
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] BSgenome.Athaliana.TAIR.04232008_1.3.18
> BSgenome_1.26.0
> GenomicFeatures_1.10.0
> AnnotationDbi_1.20.0
>   [5] Biobase_2.18.0
> VariantAnnotation_1.4.0
> Rsamtools_1.10.1
> Biostrings_2.26.0
>   [9] GenomicRanges_1.10.0
> IRanges_1.16.2
> BiocGenerics_0.4.0
> vimcom_0.9-2
> [13] setwidth_1.0-0
> colorout_0.9-9
>
> loaded via a namespace (and not attached):
>   [1] DBI_0.2-5          RCurl_1.95-0.1.2   RSQLite_0.11.2
> XML_3.95-0.1       biomaRt_2.14.0     bitops_1.0-4.1     parallel_2.15.1
> rtracklayer_1.18.0
>   [9] stats4_2.15.1      tools_2.15.1       zlibbioc_1.4.0    *
>
>
> *Thanks for the suggestion.
>
> Rebecca
>
>
> On Thu, Oct 4, 2012 at 5:01 PM, Valerie Obenchain <vobencha at fhcrc.org>wrote:
>
>> **
>> Hi Sun,
>>
>>
>>
>> On 10/04/2012 02:44 PM, sun wrote:
>>
>> After I successfully adjusted the chromosomes of VCF and TranscriptDb
>> objects to match the names of the BSgenome object, I ran predictCoding(),
>> it returned nothing except a warning message, see below,
>>
>>
>>   coding <- predictCoding(vcf, txdb, seqSource=Athaliana)
>>
>>   *Warning message:
>> In .setActiveSubjectSeq(query, subject) :
>>    circular sequence(s) in query 'chrM' ignored*
>>
>>   coding
>>
>>   GRanges with 0 ranges and 1 metadata column:
>>     seqnames    ranges strand |      GENEID
>>        <Rle> <IRanges>  <Rle> | <character>
>>    ---
>>    seqlengths:
>>
>> Here are the values of  VCF, TranscriptDb and BSgenome Object,
>>
>>
>>   vcf
>>
>>   class: VCF
>> dim: 551201 2
>> genome: TAIR10
>> exptData(1): header
>> fixed(4): REF ALT QUAL FILTER
>> info(18): DP DP4 ... PR VDB
>> geno(6): GT GQ ... SP PL
>> rownames(551201): Chr1:711 Chr1:956 ... ChrM:366919 ChrM:366920  *Qustion:
>> After used renameSeqlevels() to change chr name of vcf (Chr to chr), the
>> rownames here are still old chr name. not sure if this will be the problem
>> for predictCoding(). *
>>
>>
>> Yes, having different chromosome names in the VCF file and the txdb will
>> be a problem. Any time we want to match / find overlaps in ranges it is
>> important that the seqnames (i.e., chromosomes) match. There is an example
>> of how to use renameSeqlevels() on a VCF file on the predictCoding() man
>> page.
>>
>> library(VariantAnnotation)
>> ?predictCoding
>>
>> ## Read variants from a VCF file
>> fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
>> vcf <- readVcf(fl, "hg19")
>>
>> ## Rename seqlevels in the VCF object to match those in the TxDb
>> vcf<- renameSeqlevels(vcf, c("22"="chr22"))
>> ## Confirm common seqlevels
>> intersect(seqlevels(vcf), seqlevels(txdb))
>>
>> Be sure to confirm that you have common seqlevels between your VCF and
>> annotation. If you are having trouble please send a small reproducible
>> example.
>>
>> Valerie
>>
>>
>>   rowData values names(1): paramRangeID
>> colnames(2): sample1_aln.sorted.bam sample2_aln.sorted.bam
>> colData names(1): Samples
>>
>>
>>   txdb
>>
>>   TranscriptDb object:
>> | Db type: TranscriptDb
>> | Supporting package: GenomicFeatures
>> | Data source: TAIR 10
>> | Genus and Species: Arabodpsis
>> | miRBase build ID: NA
>> | transcript_nrow: 35386
>> | exon_nrow: 207465
>> | cds_nrow: 197160
>> | Db created by: GenomicFeatures package from Bioconductor
>> | Creation time: 2012-09-28 16:43:28 -0700 (Fri, 28 Sep 2012)
>> | GenomicFeatures version at creation time: 1.9.37
>> | RSQLite version at creation time: 0.11.1
>> | DBSCHEMAVERSION: 1.0
>>
>>
>>   Athaliana
>>
>>   Arabidopsis genome
>> |
>> | organism: Arabidopsis thaliana (Arabidopsis)
>> | provider: TAIR
>> | provider version: 04232008
>> | release date: NA
>> | release name: dumped from ADB: Mar/14/08
>> |
>> | sequences (see '?seqnames'):
>> |   chr1  chr2  chr3  chr4  chr5  chrC  chrM
>> |
>> | (use the '$' or '[[' operator to access a given sequence)
>>
>>
>>   sessionInfo()
>>
>>   R version 2.15.1 (2012-06-22)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>>   [1] BSgenome.Athaliana.TAIR.04232008_1.3.18
>> BSgenome_1.26.0
>> VariantAnnotation_1.4.0
>>   [4] Rsamtools_1.10.1
>> Biostrings_2.26.0
>> GenomicFeatures_1.10.0
>>   [7] AnnotationDbi_1.20.0
>> Biobase_2.18.0
>> GenomicRanges_1.10.0
>> [10] IRanges_1.16.2
>> BiocGenerics_0.4.0
>> vimcom_0.9-2
>> [13] setwidth_1.0-0
>> colorout_0.9-9
>>
>> loaded via a namespace (and not attached):
>>   [1] DBI_0.2-5          RCurl_1.95-0.1.2   RSQLite_0.11.2
>> XML_3.95-0.1       biomaRt_2.14.0     bitops_1.0-4.1
>>   [7] parallel_2.15.1    rtracklayer_1.18.0 stats4_2.15.1
>> tools_2.15.1       zlibbioc_1.4.0
>>
>> any ideas? thank you.
>>
>> Rebecca
>>
>>
>> On Thu, Oct 4, 2012 at 1:18 PM, Hervé Pagès <hpages at fhcrc.org> <hpages at fhcrc.org> wrote:
>>
>>
>>   Hi Rebecca,
>>
>>
>> On 10/04/2012 12:10 PM, sun wrote:
>>
>>
>>   Hi All,
>>
>> I am going to use "coding <- predictCoding(vcf, txdb,
>> seqSource=Athaliana)"
>> to detect coding SNPs. The problem is that the chromosome names are not
>> consistent among VCF, txdb and BSgenome. In vcf, the chromosome name is
>> "Chr*", in txdb, the chr name is "Chr", but in BSgenome, the chr name is
>> "chr*" .
>>
>> I know I can use renameSeqlevels() to adjust the seqlevels (chromosome
>> names) of the VCF object to match that of the txdb annotation. But how can
>> I adjust the chr name of BSgenome or TranscriptDB?
>>
>>
>>   In BioC 2.11 (released yesterday), you can rename the chromosomes of a
>> TranscriptDb object, so you could rename the chromosomes of your
>> VCF and TranscriptDb objects to match the names of the BSgenome object.
>>
>> E.g. for the TranscriptDb object:
>>
>>    seqlevels(txdb) <- sub("^c", "C", seqlevels(txdb))
>>
>> Note that renaming the chromosomes of a TranscriptDb object is a new
>> feature and is not fully implemented yet. For example, if you use
>> select() on the object you'll still get the original names (those
>> stored in the db), and if you try to specify a chromosome name thru
>> the 'vals' arg of the transcripts(), exons() and cds() extractors,
>> you still need to use the original names. This will be addressed soon.
>>
>> Our plan is to also support renaming of the chromosomes of BSgenome
>> and SNPlocs objects very soon.
>>
>> Also, an additional level of convenience will be provided via the
>> seqnameStyle() getter and setter, so you'll be able to quickly rename
>> with something like:
>>
>>    seqnameStyle(x) <- "UCSC"
>>
>> or
>>
>>    seqnameStyle(vcf) <- seqnameStyle(txdb) <- seqnameStyle(genome)
>>
>> This will work on almost any 'x' object that contains chromosome
>> names (GRanges, GRangesList, GappedAlignments, TranscriptDb, VCF,
>> BSgenome, SNPlocs, etc...)
>>
>> Cheers,
>> H.
>>
>>
>>
>>
>>
>>   Thanks,
>>
>> Rebecca
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________**_________________
>> Bioconductor mailing listBioconductor at r-project.orghttps://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>
>>   --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>>   	[[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing listBioconductor at r-project.orghttps://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list