[BioC] Why some data missing when I use R to analysis the NimbleGen array

Benilton Carvalho beniltoncarvalho at gmail.com
Fri Jul 22 14:05:43 CEST 2011


Maggie,

the annotation package that you built contains the chip information
(X/Y coordinates, probeset IDs).

The preprocessing, through rma(), summarizes the intensities using the
probeset IDs. The probeset IDs aren't necessarily transcript IDs.

In the NDF, you have a column called SEQ_ID, which is the one that
contains the probeset IDs. Open that file and look for the ID
(GRMZM2G130813_T01) you refer to. If you find it in that column, then
it must be on the output object (the object you get from rma) as well.

If the NDF (or any other source) contains further information that you
want to use on the downstream analyses (for example, gene associations
or transcript IDs that you can link to probeset ID), then you need to
load the NDF manually, appropriately extract the bits of information
of interest and merge them with the preprocessed data. This is what I
meant by 'genomic annotation' (apologies for not being careful on the
initial explanation).

About the phenoData slot: check the Biobase documentation (Section 4.2)
http://www.bioconductor.org/packages/2.8/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf

best,
b

ps: your email messages are not being shown as they're supposed to and
I believe it has to do with the fact that your email client is set to
send it in HTML format... So it would be nice if you could set it to
send messages in text format, so the messages are not garbled.

phenoData <- new("AnnotatedDataFrame",data=data.frame(pdata))

On 22 July 2011 11:28, 陈娟 <gtzxchj at hotmail.com> wrote:
>
> Dear Professor,It seems that I have found the essence of my problem.
> When I loaded rawdata(.xys files) into R and preprocessed the rawdata using rma(), it seemed that the data from B73 genome (with the name like GRMZM2G130813_T01)were lost, and I can't  extract  them via grep(). There are  92492  features in database(GEO),whereas only 69557 features in my ExpressionSet. I guess my interest data are inside the losing dataset.Maybe it's the method I used that accounts for this matter, but I don't know the solution. Could you tell me how to solve it?Looking forward to your reply!!Thank you very much!Best RegardsMaggie
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



-- 
Successful people ask better questions, and as a result, they get
better answers. (Tony Robbins)



More information about the Bioconductor mailing list