[BioC] variantAnnotation: alternative GENETIC_CODE, and circular chromosomes?

Janet Young jayoung at fhcrc.org
Tue Feb 4 20:39:52 CET 2014


Hi Herve,

That's great!  That'll be useful.   Funny timing.

Given that predictCoding is using the translate function, hopefully it will be quite easy for you guys to add the genetic.code arg to that translate instance somehow.    I guess many SNP collections (CollapsedVCF or whatever) will contain BOTH autosomal and mitochondrial SNPs, so would use a mix of genetic codes.  

I'm sure in the long run you'd aim to have the function deal with that mix.  But perhaps for now I will be able do something like this (pseudocode)

1. split my SNPs into autosomals and mitcohondrial SNPs
2. predictions1 <- predictCoding ( autosomalSNPs , txDB, seqSource=Scerevisiae) 
3. GENETIC_CODE <- getGeneticCode("SGC2")
4. fake predictCoding into ignoring the fact that chrM is circular
5. predictions2 <- predictCoding ( mitochondrialSNPs , txDB, seqSource=Scerevisiae) 
6. GENETIC_CODE <- getGeneticCode("SGC0")
7. combine predictions1 and predictions2

Does that make sense?

Looking forward to Val's input on the circular chromosome issue. I'm not 100% sure how to do step 4 above right now.  I thought this might work: 
    isCircular(Scerevisiae) <- rep(FALSE, length( isCircular(Scerevisiae) ) )
but I get an error.
    Error in `seqinfo<-`(`*tmp*`, value = <S4 object of class "Seqinfo">) : 
        'new2old' must be specified when replacing the 'seqinfo' of a BSgenome object
I'll think on it some more, too.

thanks


On Feb 4, 2014, at 4:08 AM, Hervé Pagès wrote:

> Hi Janet,
> 
> On 02/03/2014 07:47 PM, Janet Young wrote:
>> Hi there,  (I think it'll probably be Valerie looking at this question - hi Valerie),
>> 
>> I'm just beginning to look at using VariantAnnotation to annotate some SNPs I've called on some yeast data (sacCer3).  I can see this will be a really useful package for me - thanks!
>> 
>> I can see that chrM (mitochiondrial) SNPs are currently not included in the output of predictCoding, and then using locateVariants, all of chrM SNPs get annotated as intergenic/NA (with a warning, that we ignore circular chromosomes).  I can understand why that is - circular chromosomes, and a different genetic code make it trickier.  Fair enough.
>> 
>> I'm wondering what the prospects are regarding chrM SNPs in the future - any plans to include those later?
>> 
>> I'm also wondering whether I can use some hacks to get chrM SNPs annotated. Two questions/potential issues related to that I wanted to ask you guys about:
>> 
>> 1. are alternative codon tables already supported anywhere in Bioconductor?   Using "?GENETIC_CODE"  it looks like this is defined in Biostrings, and it looks like only the standard nuclear code is defined.  Are the various alternative genetic codes defined anywhere?  For this project, I'm interested in the yeast mitochondrial code, and for another I'm interested in the fly mitochondrial code.  It'd be great if we could have all the codes available (I've got another project looking at ciliate nuclear sequences, for example - not working with translations yet, but maybe later...)
>> 
>> With a little work, I'll be able to save flat files from NCBI (http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi), and read those in and transform them to a character vector that looks like GENETIC_CODE. But I realise it might be something useful to have encoded more centrally, so thought I'd ask.
> 
> What a timely question! I'll let Val answer the questions about
> support of mitochiondrial DNA in predictCoding() but I can answer
> that particular one. Last week I added a bunch of non standard genetic
> codes to Biostrings (2.31.12). To get the genetic code for Yeast
> Mitochondrial, do:
> 
>  > getGeneticCode("SGC2")
>  TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG CTT CTC CTA CTG
>  "F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "W" "W" "T" "T" "T" "T"
>  CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG ACT ACC ACA ACG
>  "P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "M" "M" "T" "T" "T" "T"
>  AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC GCA GCG GAT GAC GAA GAG
>  "N" "N" "K" "K" "S" "S" "R" "R" "V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E"
>  GGT GGC GGA GGG
>  "G" "G" "G" "G"
> 
> Its format is the same as for GENETIC_CODE. See ?GENETIC_CODE for
> the details.
> 
> I also added the 'genetic.code' arg to translate() so you can supply
> an alternate genetic code to use for translation. See ?translate for
> the details.
> 
> Please let me know if you find any issues, have questions, or want
> to suggest improvements to these new features.
> 
> Thanks,
> H.
> 
>> 
>> 2.  What issues should I think about for the circular chromosomes?   I'm thinking of a slightly hacky solution where I  ignore any annotated ORFs that wrap around from the end of the chromosome to the beginning, and then just treating it as a linear chromosome.  Actually, in my case (using sacCer3) there are no ORFs spanning the break in the circular chromosome, so I don't think I'll miss any annotations.   Turns out the same is true for human (hg19 knownGene annotations), so maybe the circular chromosome issue isn't such a big issue after all?
>> 
>> It seems like that should work, but any thoughts from you - you've thought about these questions a lot more than I have?
>> 
>> Looking forward to hearing any thoughts you have.   I know sometimes people just ignore the chrM SNPs, but it'd be nice to take a slightly more comprehensive approach if possible.
>> 
>> thanks in advance for any input you have,
>> 
>> Janet
>> 
>> 
>> -------------------------------------------------------------------
>> 
>> Dr. Janet Young
>> 
>> Malik lab
>> http://research.fhcrc.org/malik/en.html
>> 
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Avenue N., A2-025,
>> P.O. Box 19024, Seattle, WA 98109-1024, USA.
>> 
>> tel: (206) 667 4512
>> email: jayoung  ...at...  fhcrc.org
>> 
>> -------------------------------------------------------------------
>> 
>> 
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319



More information about the Bioconductor mailing list