[BioC] How map probeset_id to gene_symbols or other annotation information?

Mon Aug 10 00:48:02 CEST 2009

On Sun, Aug 9, 2009 at 4:46 PM, Kasper Daniel
Hansen<khansen at stat.berkeley.edu> wrote:
>
> On Aug 9, 2009, at 13:06 , Peng Yu wrote:
>
>> On Sun, Aug 9, 2009 at 12:03 PM, Sean Davis<seandavi at gmail.com> wrote:
>>>
>>> Hi, Peng.
>>>
>>> I don't mean to sound rude, but everyone on this list is quite busy.  You
>>> will need to make time to do some of your own research, unfortunately.
>>>  As
>>> an exercise and an answer to your question, check out the Table of
>>> Contents
>>> of the R Data Import/Export.  If there is still a question about what
>>> section is most appropriate, feel free to post back to the list the code
>>> you
>>> have tried, any error messages, and the output of sessionInfo().  And,
>>> yes,
>>> you will benefit from at least skimming the entire manual--you will learn
>>> quite a bit.
>>
>> Hi Sean,
>>
>> I have been skimming the manual. One thing I am not sure is that
>> whether I should spend a few days on learning all the materials you
>> mentioned, while I could use some other language that I am more
>> familiar with and solve the problem quickly. I would like to solve my
>> question today if possible. However, I completely understand that I
>> should read all the manuals that you mentioned in the long run.
>>
>> I have thought of using perl to solve my problem. But I think that it
>> is still better to figure out a way to do so in R as well. The code in
>> perl would not be long, so I think the code in R would not be long,
>> either. It doesn't seem that it would take an experienced R user a
>> long time to figure out the R commands to map all the probeset_id to
>> gene names or ensembl ids, does it?
>>
>> I know that I could use
>> read.csv("MoGene-1_0-st-v1.na29.mm9.probeset.csv") to read the file,
>> which gives a data frame. But how to extract the useful columns from
>> the data frame? How to construct a mapping between the entry in one
>> column to the entry in another column? I should use
>> read.table("genes.txt") to read "genes.txt", right? How to replace its
>> first column with the appropriate gene names or emsembl id using the
>> mapping?
>>
>> It seems that MoGene-1_0-st-v1.na29.mm9.probeset.csv should have
>> enough annotation information for my problem. Why do I need
>> "mogene10stprobeset.db"?
>
> Peng,
>
> Let me quote Wolfgang Huber: "the purpose of this mailing list is not for
> other people to do your homework for you".  I don't think anyone are very
> inclined to help you, if you don't spend some time yourself reading about
> the language.  Some of the questions you ask above are stuff you ought to
> know after spending 10 minutes with "An introduction to R".
>
> I believe in using the right tools for the job, and if you think you can do
> your stuff in a few hours using Perl, I think you should use Perl.  If you
> want access to some of the powers and time saving features of R, you need to
> devote some time to learning it.  But you cannot expect to do even simple
> stuff in a new language without spending some initial time on it.

Hi Kasper

I don't think that I want somebody to do the homework for me. One
thing that I feel frustrated about reading R documentation is that the
useful information is often scattered in different places, which is
not easy for a new user to piece them together. One example is
mogene10stprobeset.db, whose document doesn't mention AnnotationDbi. I
feel that learning from example complementing with reading R
documentation is a more efficient way.

BTW, Do you know why "mogene10stprobeset.db" is needed if I have
MoGene-1_0-st-v1.na29.mm9.probeset.csv already?

Regards,
Peng