[BioC] How map probeset_id to gene_symbols or other annotation information?

Sun Aug 9 23:46:03 CEST 2009

On Aug 9, 2009, at 13:06 , Peng Yu wrote:

> On Sun, Aug 9, 2009 at 12:03 PM, Sean Davis<seandavi at gmail.com> wrote:
>>
>> Hi, Peng.
>>
>> I don't mean to sound rude, but everyone on this list is quite  
>> busy.  You
>> will need to make time to do some of your own research,  
>> unfortunately.  As
>> an exercise and an answer to your question, check out the Table of  
>> Contents
>> of the R Data Import/Export.  If there is still a question about what
>> section is most appropriate, feel free to post back to the list the  
>> code you
>> have tried, any error messages, and the output of sessionInfo().   
>> And, yes,
>> you will benefit from at least skimming the entire manual--you will  
>> learn
>> quite a bit.
>
> Hi Sean,
>
> I have been skimming the manual. One thing I am not sure is that
> whether I should spend a few days on learning all the materials you
> mentioned, while I could use some other language that I am more
> familiar with and solve the problem quickly. I would like to solve my
> question today if possible. However, I completely understand that I
> should read all the manuals that you mentioned in the long run.
>
> I have thought of using perl to solve my problem. But I think that it
> is still better to figure out a way to do so in R as well. The code in
> perl would not be long, so I think the code in R would not be long,
> either. It doesn't seem that it would take an experienced R user a
> long time to figure out the R commands to map all the probeset_id to
> gene names or ensembl ids, does it?
>
> I know that I could use
> read.csv("MoGene-1_0-st-v1.na29.mm9.probeset.csv") to read the file,
> which gives a data frame. But how to extract the useful columns from
> the data frame? How to construct a mapping between the entry in one
> column to the entry in another column? I should use
> read.table("genes.txt") to read "genes.txt", right? How to replace its
> first column with the appropriate gene names or emsembl id using the
> mapping?
>
> It seems that MoGene-1_0-st-v1.na29.mm9.probeset.csv should have
> enough annotation information for my problem. Why do I need
> "mogene10stprobeset.db"?

Peng,

Let me quote Wolfgang Huber: "the purpose of this mailing list is not  
for other people to do your homework for you".  I don't think anyone  
are very inclined to help you, if you don't spend some time yourself  
reading about the language.  Some of the questions you ask above are  
stuff you ought to know after spending 10 minutes with "An  
introduction to R".

I believe in using the right tools for the job, and if you think you  
can do your stuff in a few hours using Perl, I think you should use  
Perl.  If you want access to some of the powers and time saving  
features of R, you need to devote some time to learning it.  But you  
cannot expect to do even simple stuff in a new language without  
spending some initial time on it.

Kasper