[BioC] Extracting protein sequence associated to UCSC transcript id

rcaloger raffaele.calogero at unito.it
Wed Jan 8 19:25:22 CET 2014


I found a way to extract the protein sequences querying the UCSC web page.
However, there should be a  more elegant way to do it.
library(RCurl)
trs <- c("uc003mfv.3", "uc001ajb.1", "uc011asd.2")
myquery<- list()
for(i in 1:length(trs)){
     myquery[[i]] <- 
getURL(paste("http://genome-euro.ucsc.edu/cgi-bin/hgGene?hgsid=195297095&hgg_do_getProteinSeq=1&hgg_gene=", 
trs[i],sep=""))
     Sys.sleep(30)
}

It is interesting that in bioconductor there are no databases linking 
transcripts to proteins
Cheers
Raf

On 08/01/14 17:10, Michael Lawrence wrote:
> In theory, you should be able to get the cds regions using e.g. the
> Homo.sapiens package, but it seems kind of tough to retrieve those for UCSC
> Known Gene identifiers (assuming that is what you have). Marc Carlson could
> probably help more.
>
> Michael
>
>
>
>
>
> On Wed, Jan 8, 2014 at 6:31 AM, rcaloger <raffaele.calogero at unito.it> wrote:
>
>> Dear Michael,
>> thank for the kind suggestion but unfortunately it does not solve my
>> problem because, using the approach you are suggesting, I do not have
>> access to the position of the start codon for the different isoforms.
>> Cheers
>> Raf
>>
>>
>> On 07/01/14 16:44, Michael Lawrence wrote:
>>
>>> If you had the transcript coordinates (as GRangesList, perhaps from an
>>> exonsBy() on a TranscriptDb) you could use extractTranscriptsFromGenome()
>>> and translate, see the GenomicFeatures vignette for an example.
>>>
>>> Michael
>>>
>>>
>>> On Tue, Jan 7, 2014 at 6:54 AM, rcaloger <raffaele.calogero at gmail.com>
>>> wrote:
>>>
>>>   Hi,
>>>> In order to validate fusion products I need to be sure that the peptides
>>>> encoded by the the two fused proteins are in the same frame.
>>>> I have now a function that allows to confirm the protein1 and protein2
>>>> have sequences located in the same frame.
>>>> However, I got stack to retrieve those proteins sequences from UCSC. I
>>>> did
>>>> not found a quick way to retrieve the protein sequence associated to a
>>>> UCSC
>>>> ID.
>>>> Indeed the protein sequence is present in the UCSC genome browser, but I
>>>> do not know how to grab it.
>>>> Any suggestion?
>>>> Cheers
>>>> Raffaele
>>>>
>>>> --
>>>>
>>>> ----------------------------------------
>>>> Prof. Raffaele A. Calogero
>>>> Bioinformatics and Genomics Unit
>>>> MBC Centro di Biotecnologie Molecolari
>>>> Via Nizza 52, Torino 10126
>>>> tel.   ++39 0116706457
>>>> Fax    ++39 0112366457
>>>> Mobile ++39 3333827080
>>>> email: raffaele.calogero at unito.it
>>>>          raffaele[dot]calogero[at]gmail[dot]com
>>>> www:   http://www.bioinformatica.unito.it
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.
>>>> science.biology.informatics.conductor
>>>>
>>>>
>> --
>>
>> ----------------------------------------
>> Prof. Raffaele A. Calogero
>> Bioinformatics and Genomics Unit
>> MBC Centro di Biotecnologie Molecolari
>> Via Nizza 52, Torino 10126
>> tel.   ++39 0116706457
>> Fax    ++39 0112366457
>> Mobile ++39 3333827080
>> email: raffaele.calogero at unito.it
>>         raffaele[dot]calogero[at]gmail[dot]com
>> www:   http://www.bioinformatica.unito.it
>>
>>


-- 

----------------------------------------
Prof. Raffaele A. Calogero
Bioinformatics and Genomics Unit
MBC Centro di Biotecnologie Molecolari
Via Nizza 52, Torino 10126
tel.   ++39 0116706457
Fax    ++39 0112366457
Mobile ++39 3333827080
email: raffaele.calogero at unito.it
        raffaele[dot]calogero[at]gmail[dot]com
www:   http://www.bioinformatica.unito.it



More information about the Bioconductor mailing list