[BioC] variantAnnotation: alternative GENETIC_CODE, and circular chromosomes?

Taylor, Sean D sdtaylor at fhcrc.org
Sun Feb 9 01:24:31 CET 2014


Valerie,

I am excited about these changes and can't wait to use them.

For clarification, to use genetic.code, I would pass in an argument like this:

predictCoding(..., genetic.code=mycode)

where 'mycode' is a named vector of same format as GENETIC_CODE used in translate().

I haven't been using devel in while. Any idea when this might make its way into release?

Thanks,
Sean

> -----Original Message-----
> From: Valerie Obenchain [mailto:vobencha at fhcrc.org]
> Sent: Tuesday, February 04, 2014 11:38 AM
> To: Pages, Herve; Young, Janet; bioconductor at r-project.org; Taylor, Sean D
> Subject: Re: [BioC] variantAnnotation: alternative GENETIC_CODE, and
> circular chromosomes?
> 
> Hi Janet,
> 
> Last week we enabled findOverlaps(..., type='within') to work on circular
> chromosomes. It was this restriction that prevented
> locateVariants() and predictCoding() from handling ChrM.
> VariantAnnotation 1.9.34 in devel has the most recent changes.
> 
> The output of locateVariants() includes ChrM because the function is
> reporting where the range falls wrt the gene (coding, utr, intron, etc.).
> predictCoding() however only reports coding variants. If the annotation you
> supply does not have any coding regions for ChrM or if the ranges don't fall in
> the coding regions then none will be reported.
> To confirm your annotation has coding regions for ChrM:
> 
> cds <- cdsBy(txdb)
> cds[seqnames(cds) %in% "chrM"]  ## or whatever the proper name is
> 
> #1:
> I've added support for the 'genetic.code' and 'if.fuzzy.codon' args to
> translate(). To use a different genetic code just pass the named arg
> ('genetic.code') to predictCoding().
> 
> #2:
> We've tried to handle the issue of ChrM through making findOverlaps()
> behave appropriatly with circular chromosomes. The 'type' argument has
> several options (start, end, any, within, equal). This allows quite a bit of
> flexibility. When the annotation has an ORF that spans the start/end
> findOverlaps() will still behave appropriately according to 'type'.
> 
> ## annotation with seqlength 9
> genes <- GRanges(seqnames=rep.int("A", 4),
>                   IRanges(start=c(2, 4, 6, 8), width=3))
> seqinfo(genes) <- Seqinfo(seqnames="A", seqlengths=9, isCircular=TRUE)
> 
> ## both ranges span the start/end
> ranges <- GRanges(seqnames=rep.int("A", 2),
>                    IRanges(9, width=c(2, 4)))
> 
> >> findOverlaps(ranges, genes, type="any")
> > Hits of length 3
> > queryLength: 2
> > subjectLength: 4
> >   queryHits subjectHits
> >    <integer>   <integer>
> >  1         1           4
> >  2         2           1
> >  3         2           4
> 
> >> findOverlaps(ranges, genes, type="within")
> > Hits of length 1
> > queryLength: 2
> > subjectLength: 4
> >   queryHits subjectHits
> >    <integer>   <integer>
> >  1         1           4
> 
> With the combination of findOverlaps() now working on circular
> chromosomes for all values of 'type' and Herve adding the new genetic codes
> to Biostrings there should be no need to ignore ChrM. If you run into trouble
> or if anything look strange please let us know.
> 
> Thanks.
> Valerie
> 
> 
> 
> On 02/04/2014 04:08 AM, Hervé Pagès wrote:
> > Hi Janet,
> >
> > On 02/03/2014 07:47 PM, Janet Young wrote:
> >> Hi there,  (I think it'll probably be Valerie looking at this
> >> question
> >> - hi Valerie),
> >>
> >> I'm just beginning to look at using VariantAnnotation to annotate
> >> some SNPs I've called on some yeast data (sacCer3).  I can see this
> >> will be a really useful package for me - thanks!
> >>
> >> I can see that chrM (mitochiondrial) SNPs are currently not included
> >> in the output of predictCoding, and then using locateVariants, all of
> >> chrM SNPs get annotated as intergenic/NA (with a warning, that we
> >> ignore circular chromosomes).  I can understand why that is -
> >> circular chromosomes, and a different genetic code make it trickier.  Fair
> enough.
> >>
> >> I'm wondering what the prospects are regarding chrM SNPs in the
> >> future
> >> - any plans to include those later?
> >>
> >> I'm also wondering whether I can use some hacks to get chrM SNPs
> >> annotated. Two questions/potential issues related to that I wanted to
> >> ask you guys about:
> >>
> >> 1. are alternative codon tables already supported anywhere in
> >> Bioconductor?   Using "?GENETIC_CODE"  it looks like this is defined
> >> in Biostrings, and it looks like only the standard nuclear code is
> >> defined.  Are the various alternative genetic codes defined anywhere?
> >> For this project, I'm interested in the yeast mitochondrial code, and
> >> for another I'm interested in the fly mitochondrial code.  It'd be
> >> great if we could have all the codes available (I've got another
> >> project looking at ciliate nuclear sequences, for example - not
> >> working with translations yet, but maybe later...)
> >>
> >> With a little work, I'll be able to save flat files from NCBI
> >> (http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi), and read
> >> those in and transform them to a character vector that looks like
> >> GENETIC_CODE. But I realise it might be something useful to have
> >> encoded more centrally, so thought I'd ask.
> >
> > What a timely question! I'll let Val answer the questions about
> > support of mitochiondrial DNA in predictCoding() but I can answer that
> > particular one. Last week I added a bunch of non standard genetic
> > codes to Biostrings (2.31.12). To get the genetic code for Yeast
> > Mitochondrial, do:
> >
> >    > getGeneticCode("SGC2")
> >    TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG
> CTT
> > CTC CTA CTG
> >    "F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "W" "W" "T"
> > "T" "T" "T"
> >    CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG
> ACT
> > ACC ACA ACG
> >    "P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "M" "M" "T"
> > "T" "T" "T"
> >    AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC GCA
> GCG GAT
> > GAC GAA GAG
> >    "N" "N" "K" "K" "S" "S" "R" "R" "V" "V" "V" "V" "A" "A" "A" "A" "D"
> > "D" "E" "E"
> >    GGT GGC GGA GGG
> >    "G" "G" "G" "G"
> >
> > Its format is the same as for GENETIC_CODE. See ?GENETIC_CODE for the
> > details.
> >
> > I also added the 'genetic.code' arg to translate() so you can supply
> > an alternate genetic code to use for translation. See ?translate for
> > the details.
> >
> > Please let me know if you find any issues, have questions, or want to
> > suggest improvements to these new features.
> >
> > Thanks,
> > H.
> >
> >>
> >> 2.  What issues should I think about for the circular chromosomes?
> >> I'm thinking of a slightly hacky solution where I  ignore any
> >> annotated ORFs that wrap around from the end of the chromosome to
> the
> >> beginning, and then just treating it as a linear chromosome.
> >> Actually, in my case (using sacCer3) there are no ORFs spanning the
> >> break in the circular chromosome, so I don't think I'll miss any
> >> annotations.   Turns out the same is true for human (hg19 knownGene
> >> annotations), so maybe the circular chromosome issue isn't such a big
> >> issue after all?
> >>
> >> It seems like that should work, but any thoughts from you - you've
> >> thought about these questions a lot more than I have?
> >>
> >> Looking forward to hearing any thoughts you have.   I know sometimes
> >> people just ignore the chrM SNPs, but it'd be nice to take a slightly
> >> more comprehensive approach if possible.
> >>
> >> thanks in advance for any input you have,
> >>
> >> Janet
> >>
> >>
> >> -------------------------------------------------------------------
> >>
> >> Dr. Janet Young
> >>
> >> Malik lab
> >> http://research.fhcrc.org/malik/en.html
> >>
> >> Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Avenue N., A2-025,
> >> P.O. Box 19024, Seattle, WA 98109-1024, USA.
> >>
> >> tel: (206) 667 4512
> >> email: jayoung  ...at...  fhcrc.org
> >>
> >> -------------------------------------------------------------------
> >>
> >>
> >>
> >>     [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> 
> 
> --
> Valerie Obenchain
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B155
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: vobencha at fhcrc.org
> Phone:  (206) 667-3158
> Fax:    (206) 667-1319



More information about the Bioconductor mailing list