[BioC] BSgenome or org.Hs.eg.db to find gene length

Marc Carlson mcarlson at fhcrc.org
Fri Oct 12 19:12:35 CEST 2012


Hi Fatemehsadat,

Lets keep this on the list.  We almost always want to keep the thread 
public so that others can benefit from our conversations.  And also, I 
am not really sure how to answer your question (it's not a simple 
question), and others may have suggestions.  You can't get their input 
if you only speak with me.

Really though, your question about how to choose really depends on 
context that you have not provided us here.  What is it that you want to 
know?  I mentioned some strategies in my earlier post.  For some cases 
the longest transcript may be what you want, for others you may want the 
maximum range that a transcript can cover, for other cases, you may want 
to "buffer" that region by adding to it.  For yet other cases you may 
not care about the range at all and may only want to call unique on the 
result.  But I can't give even an opinion without knowing more about 
what you are trying to do.


   Marc


On 10/12/2012 06:15 AM, Fatemehsadat Seyednasrollah wrote:
> Hi,
> Thank you so much. It was great using the package from the point of diversity of available features. Now I was wondering I can use the result of my query as an annotation file for other R packages as well.
> Just I wanted to know your opinion about how to decide which isofrom should I decide to choose for my annotation file.
> Imagine I need an annotation file with row names of gene symbols for example for the first symbol I have :
>
>    SYMBOL  TXSTART    TXEND length
> 1   A1BG 58858172 58864865   6693
> 2   A1BG 58859832 58874214  14382
>
> and so many other duplicated gene symbols. How do you decide which isoform to choose for having a unique annotation file of gene symbols.
>
> Thank you again.
> ________________________________________
> From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Marc Carlson [mcarlson at fhcrc.org]
> Sent: Friday, October 12, 2012 1:18 AM
> To: Michael Lawrence
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] BSgenome or org.Hs.eg.db to find gene length
>
> Oh sorry I missed that little detail about using gene symbols.
>
> Here is how you would do it when you need to query by gene symbol:
>
> library(Homo.sapiens)
> cols(Homo.sapiens) ## shows cols you could use
> keytypes(Homo.sapiens) ## shows keytypes
> k<- keys(Homo.sapiens,keytype="SYMBOL")  ## discovers all available
> keys of this kind
> result<- select(Homo.sapiens, k, cols=c("TXNAME","TXSTART","TXEND",
> "TXSTRAND"), keytype="SYMBOL")
>
>
> The plan to support transcriptsBy etc for OrganismDbi is still just a
> plan.  But we don't intend for it to remain a "plan" forever.
>
>
>     Marc
>
>
>
>
>
> On 10/11/2012 01:58 PM, Michael Lawrence wrote:
>> It's definitely a step in the right direction. A small next step would
>> be supporting queries based on gene symbols, as the OP had asked
>> about. Sure, one could do a transcriptsBy() on the TxDb package and
>> subset, but that means it has to be by="gene", and it's slower. Also,
>> has there been any progress towards supporting transcriptsBy on the
>> OrganismDbi package?
>>
>> Michael
>>
>> On Thu, Oct 11, 2012 at 1:46 PM, Marc Carlson<mcarlson at fhcrc.org
>> <mailto:mcarlson at fhcrc.org>>  wrote:
>>
>>      Yes,
>>
>>      Sorry about the lack of memos.  ;)  OrganismDbi is a new package
>>      that allows you to make meta packages from annotation packages
>>      that implement a select() method.  Homo.sapiens is one we made for
>>      humans.  It combines the human org package, the hg19 txdb known
>>      gene package and the GO.db package.  The package does not actually
>>      "contain" all of that data though.  It just retrieves it as
>>      requested and returns it to users as if there was a single place
>>      it was all coming from.
>>
>>        Marc
>>
>>
>>
>>
>>      On 10/11/2012 12:33 PM, Steve Lianoglou wrote:
>>
>>          On Thu, Oct 11, 2012 at 2:54 PM, Tim Triche,
>>          Jr.<tim.triche at gmail.com<mailto:tim.triche at gmail.com>>   wrote:
>>
>>              OrganismDbi -- too many of us are used to doing things the
>>              confusing way --
>>              using OrganismDbi packages like Homo.sapiens will be
>>              better long-term
>>
>>          Cool ... I like being less confused.
>>
>>          Thanks for the pointer,
>>          -steve
>>
>>
>>      _______________________________________________
>>      Bioconductor mailing list
>>      Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>>      https://stat.ethz.ch/mailman/listinfo/bioconductor
>>      Search the archives:
>>      http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>          [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list