[BioC] How map probeset_id to gene_symbols or other annotation information?

Sun Aug 9 22:06:52 CEST 2009

On Sun, Aug 9, 2009 at 12:03 PM, Sean Davis<seandavi at gmail.com> wrote:
>
>
> On Sun, Aug 9, 2009 at 11:53 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>
>> On Sun, Aug 9, 2009 at 7:01 AM, Sean Davis<seandavi at gmail.com> wrote:
>> >
>> >
>> > On Sat, Aug 8, 2009 at 6:31 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have run the following 'run.R' script, which generated the file
>> >> 'gene.txt'. My question is how to map the probeset_ids to the gene
>> >> names or other information that is available in
>> >>
>> >>
>> >> http://www.affymetrix.com/analysis/downloads/na29/wtgene/MoGene-1_0-st-v1.na29.mm9.probeset.csv.zip?
>> >> What package I should use to read the '.csv' file?
>> >
>> > Hi, Peng.  You will probably want to do some reading before posting.
>> > There
>> > is an entire manual on input/output with R available from CRAN that you
>> > should read.
>> >
>> > http://watson.nci.nih.gov/cran_mirror/manuals.html
>>
>> I don't have time to read all the manual. Which manual should I in
>> particular focus on? "R Data Import/Export"? Would you please point to
>> me which section is the most important for my task?
>
> Hi, Peng.
>
> I don't mean to sound rude, but everyone on this list is quite busy.  You
> will need to make time to do some of your own research, unfortunately.  As
> an exercise and an answer to your question, check out the Table of Contents
> of the R Data Import/Export.  If there is still a question about what
> section is most appropriate, feel free to post back to the list the code you
> have tried, any error messages, and the output of sessionInfo().  And, yes,
> you will benefit from at least skimming the entire manual--you will learn
> quite a bit.

Hi Sean,

I have been skimming the manual. One thing I am not sure is that
whether I should spend a few days on learning all the materials you
mentioned, while I could use some other language that I am more
familiar with and solve the problem quickly. I would like to solve my
question today if possible. However, I completely understand that I
should read all the manuals that you mentioned in the long run.

I have thought of using perl to solve my problem. But I think that it
is still better to figure out a way to do so in R as well. The code in
perl would not be long, so I think the code in R would not be long,
either. It doesn't seem that it would take an experienced R user a
long time to figure out the R commands to map all the probeset_id to
gene names or ensembl ids, does it?

I know that I could use
read.csv("MoGene-1_0-st-v1.na29.mm9.probeset.csv") to read the file,
which gives a data frame. But how to extract the useful columns from
the data frame? How to construct a mapping between the entry in one
column to the entry in another column? I should use
read.table("genes.txt") to read "genes.txt", right? How to replace its
first column with the appropriate gene names or emsembl id using the
mapping?

It seems that MoGene-1_0-st-v1.na29.mm9.probeset.csv should have
enough annotation information for my problem. Why do I need
"mogene10stprobeset.db"?

Regards,
Peng