[BioC] function for translation of ORFs

Thomas Girke thomas.girke at ucr.edu
Tue Nov 11 22:52:09 CET 2008


Here is a basic translateDNA function that I wrote some time ago for a course. When you 
source the corresponding script, then it will return some short instructions on using it. 
The given test sample will import all ORFs from the Halobacterium genome from NCBI's ftp site and 
translate them into all six open reading frames. If you need only one frame translated, then
you can specify this under the 'frame' argument like this: translateDNA(myseq="ATGCAT", frame=c(1), 
pepCode="single"). 

Just try in R: 

source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/translateDNA.R")

Best,

Thomas


On Tue, Nov 11, 2008 at 01:06:46PM -0800, Robert Gentleman wrote:
> I don't think that there is a specific one, but if your ORF is called y say,
> then, using some bits from the Biostrings package, but mainly pure R, you can do
> this:
> 
>  a1 <- toupper(y)
>  a2 <- substring(y, seq(1, nchar(y), by=3), seq(3, nchar(y), by=3))
>  aa <- paste(RNA_GENETIC_CODE[x], collapse="")
> 
>  If your sequence is not RNA (but rather DNA), you can use dna2rna to first
> "transcribe" it. There is a transcribe function, but be careful as you need to
> know the orientation of the original sequence (usually it is reported as if
> already transcribed - so reverse complemented, but if not there are functions in
> Biostrings to do that.)
> 
>  Note that this vectorizes, so if you have lots of sequences put them all in one
> character vector, and it should be reasonably fast.
> 
>  best wishes
>    Robert
> 
> 
> Ana Conesa wrote:
> > Dear list,
> > 
> > Can someone indicate a R function for translating an open reading frame
> > into a protein sequence?
> > 
> > Thanks
> > 
> > Ana
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > 
> 
> -- 
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Thomas Girke
Assistant Professor of Bioinformatics
Director, IIGB Bioinformatic Facility
Center for Plant Cell Biology (CEPCEB)
Institute for Integrative Genome Biology (IIGB)
Department of Botany and Plant Sciences
1008 Noel T. Keen Hall
University of California
Riverside, CA 92521

E-mail: thomas.girke at ucr.edu
Website: http://faculty.ucr.edu/~tgirke
Ph: 951-827-2469
Fax: 951-827-4437



More information about the Bioconductor mailing list