[BioC] intron sequences from biomaRt

Herve Pages hpages at fhcrc.org
Wed Jun 20 20:51:27 CEST 2007


Dario Greco wrote:
> hi Steffen,
> 
> thank you once again for your answer.
> actually, your advice of subtracting the exons from the unspliced 
> transcripts worked out.
> i have actually done it within R using the biomaRt and Biotrings facilities.

Hi Dario,

There is a new function 'mask' in Biotrings 2.5.10 that can perhaps be useful for your
problem. For example, let's say that you've already managed to get the starts/ends for
the 30 exons belonging to gene "FBgn0025803" (chromosome 3R) of the fly:

  exons_start, exons_end: integer vectors of length 30

Note that the start and end for gene "FBgn0025803" are 'min(exons_start)' and 'max(exons_end)'.

  library(BSgenome.Dmelanogaster.FlyBase.r51)
  exons <- views(Dmelanogaster[["3R"]], exons_start, exons_end)
  exons
  introns <- mask(exons)
  introns <- introns[-c(1, length(introns))]
  introns

Note that the introns obtained by this method are the portions of the gene that don't
belong to any of the exons. This is different from what you would get if you were
extracting the introns by looking at each individual splicing.
Also note that, in the example above, we get 24 introns only: this is because there are
overlaps among the 30 exons.

See ?mask for other examples.

Cheers,
H.

> 
> at the moment, my code works fine but it is quite ugly and needs 
> optimization. however, if there is any interest, i will be happy to 
> share (after some needed cosmetics ;-) )
> 
> thanks again,
> d
> 
> 
> 
> Durinck, Steffen (NIH/NCI) [F] wrote:
>> Hi Dario,
>>
>> If the BioMart web server would implement this, it would become readily available in R, but the are currently no plans for them to do this and neither do we currently intend to implement this ourselves in biomaRt (unless you want to contribute this function;) ).  Would my previous suggestion of subtracting the exonic sequences from the unspliced transcript sequences not work?  If not you'll indeed have to use the Ensembl Core Perl API and create a script to retrieve the intronic sequences. 
>>
>> Best regards,
>> Steffen
>>
>> -----Original Message-----
>> From: Dario Greco [mailto:dario.greco at helsinki.fi]
>> Sent: Mon 6/18/2007 7:14 AM
>> To: Durinck, Steffen (NIH/NCI) [F]
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] intron sequences from biomaRt
>>  
>> hi Steffen,
>>
>> thank you very much for your email. i asked to the ensembl helpdesk and 
>> this is what they replied:
>>
>> "...The only way to retrieve intronic sequences in batch mode is by using
>> the Ensembl Core Perl API:
>> http://www.ensembl.org/info/software/core/index.html
>> I hope this answers your question. Please let us know if you have any
>> other questions or problems..."
>>
>> is there any idea of implementing/using this from R? shall i go by scratch with bioperl?
>>
>> thanks again,
>> yours
>> d
>>
>>
>>
>>
>> Durinck, Steffen (NIH/NCI) [F] wrote:
>>   
>>> Dear Dario,
>>>
>>> No there is currently no possibility to select intronic sequences directly.  You could request this at helpdesk at ensembl.org and see if they want to add this feature in future versions of Ensembl.  For now the only way to do it with biomaRt would be to retrieve the transcript or gene sequences and the exon sequences and then find the intronic sequences by splitting the transcript sequences on the different exons.  
>>>
>>> Hope this helps,
>>> Steffen
>>>
>>>
>>> -----Original Message-----
>>> From: Dario Greco [mailto:dario.greco at helsinki.fi]
>>> Sent: Fri 6/15/2007 9:13 AM
>>> To: bioconductor at stat.math.ethz.ch
>>> Subject: [BioC] intron sequences from biomaRt
>>>  
>>> dear list,
>>>
>>> i need to get the intron sequences for a group of entrez gene ids.
>>> is there any way to do it using biomaRt? apparently there is no option in the 
>>> getSequence() function.
>>>
>>>   
>>>     
>>>> sessionInfo()
>>>>     
>>>>       
>>> R version 2.5.0 (2007-04-23)
>>> i686-redhat-linux-gnu
>>>  biomaRt    RCurl      XML
>>> "1.10.0"  "0.8-1"  "1.7-3"
>>>
>>> any suggestions?
>>> thanks for your help.
>>>
>>> yours
>>> d
>>>
>>>   
>>>     
>>   
>



More information about the Bioconductor mailing list