[BioC] annotating microarray data with mogene10stv1

Jakub Stanislaw Nowak jakub.nowak at ed.ac.uk
Tue Jul 22 23:10:19 CEST 2014


Hi Xiayu and Jim

Now it is working nicely.

Many thanks guys,

Jakub

On 22 Jul 2014, at 21:04, Rao,Xiayu <XRao at mdanderson.org> wrote:

> Hi, Jakub
>  
> When you do ID <- getMainProbes(eset), the ID here is an expression set rather than a character vector. To extract the character vector, you can do featureNames(ID).
>  
> select(mogene10sttranscriptcluster.db, featureNames(ID), c("SYMBOL","GENENAME","ENTREZID"))
> 
> Best,
> Xiayu
>  
>  
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Jakub Stanislaw Nowak
> Sent: Tuesday, July 22, 2014 2:42 PM
> To: James W. MacDonald
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] annotating microarray data with mogene10stv1
>  
> Hi Jim,
> 
> Thanks for your suggestion. Somehow I overlooked the function select. Now I think I am getting closer.
> I have a problem with applying select () to my probes. I think it  may be due to type of ID = probes value type which is ExpressionSet.
> 
> So first as explained before I generated the ID containing main probes from my dataset
> 
> > > ID <- getMainProbes(eset)
> > > ID
> > ExpressionSet (storageMode: lockedEnvironment)
> > assayData: 28858 features, 6 samples 
> >   element names: exprs 
> > protocolData
> >   rowNames: mock1 mock2 ... siLin28a2 (6 total)
> >   varLabels: exprs dates
> >   varMetadata: labelDescription channel
> > phenoData
> >   rowNames: mock1 mock2 ... siLin28a2 (6 total)
> >   varLabels: index
> >   varMetadata: labelDescription channel
> > featureData: none
> > experimentData: use 'experimentData(object)'
> > Annotation: pd.mogene.1.0.st.v1 
> 
> Then I wanted to annotate using select() and I am getting this error.
> 
> > > tmp <- select(mogene10sttranscriptcluster.db, ID, c("SYMBOL","GENENAME","ENTREZID"))
> > Error in .testForValidKeys(x, keys, keytype) : 
> >   'keys' must be a character vector
> 
> 
> However if I use ID which is generated with featureNames() the select() works but I think I am not removing control probes that you were describing before by applying this approach.
> 
> Is there a way that I can convert value which is of type ExpressionSet to a character type? Or alternatively what should I do make it work?
> 
> Many thanks,
> 
> Jakub
> 
> On 22 Jul 2014, at 17:21, James W. MacDonald <jmacdon at uw.edu> wrote:
> 
> > Hi Jakub,
> > 
> > Please don't take questions off-list (use 'Reply-all' when responding).
> > 
> > On 7/22/2014 12:06 PM, Jakub Stanislaw Nowak wrote:
> >> Hi Jim,
> >> 
> >> I think I have couple follow up questions. As I got stuck trying using getMainProbes function.
> >> As I am still a beginner with R my question might sound quite naive
> >> 
> >> 1. First question is about loading data using oligo package. Which approach would you use or they both give the same output?
> >> 
> >>>> celFiles<-list.celfiles()
> >>>> mydata <- read.celfiles(celFiles)
> >>> Platform design info loaded.
> >>> Reading in : GSM910962.CEL
> >>> Reading in : GSM910963.CEL
> >>> Reading in : GSM910964.CEL
> >>> Reading in : GSM910965.CEL
> >>> Reading in : GSM910966.CEL
> >>> Reading in : GSM910967.CEL
> >> 
> >> or
> >> 
> >>>> adf<-read.AnnotatedDataFrame("target.txt",row.names=1, header=TRUE, as.is=TRUE)
> >>>> mydata2 <- read.celfiles(filenames=pData(adf)$FileName,phenoData=adf)
> >>> Platform design info loaded.
> >>> Reading in : GSM910962.CEL
> >>> Reading in : GSM910963.CEL
> >>> Reading in : GSM910964.CEL
> >>> Reading in : GSM910965.CEL
> >>> Reading in : GSM910966.CEL
> >>> Reading in : GSM910967.CEL
> >>> Warning message:
> >>> In read.celfiles(filenames = pData(adf)$FileName, phenoData = adf) :
> >>>   'channel' automatically added to varMetadata in phenoData.
> > 
> > There should be no difference between the two, other than the obvious difference in the phenoData slot.
> > 
> >> 
> >> 2. how would use function getMainProbes
> >> 
> >> I tried this and I ended up getting an error
> >> 
> >>>> eset <- rma(mydata)
> >>> Background correcting
> >>> Normalizing
> >>> Calculating Expression
> >> 
> >>>> ID <- getMainProbes(eset)
> >>>> ID
> >>> ExpressionSet (storageMode: lockedEnvironment)
> >>> assayData: 28858 features, 6 samples
> >>>   element names: exprs
> >>> protocolData
> >>>   rowNames: mock1 mock2 ... siLin28a2 (6 total)
> >>>   varLabels: exprs dates
> >>>   varMetadata: labelDescription channel
> >>> phenoData
> >>>   rowNames: mock1 mock2 ... siLin28a2 (6 total)
> >>>   varLabels: index
> >>>   varMetadata: labelDescription channel
> >>> featureData: none
> >>> experimentData: use 'experimentData(object)'
> >>> Annotation: pd.mogene.1.0.st.v1
> > 
> > You didn't get an error. You were returned an ExpressionSet containing only the 28,858 main probes (you started with 35K or so, IIRC).
> > 
> >> 
> >>>> symbol <- getSYMBOL(ID, "pd.mogene.1.0.st.v1")
> >>> Error in unlist(lookUp(x, data, "SYMBOL")) :
> >>>   error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in mget(x, envir = getAnnMap(what, chip = data, load = load), ifnotfound = NA) :
> >>>   error in evaluating the argument 'envir' in selecting a method for function 'mget': Error in (function (classes, fdef, mtable)  :
> >>>   unable to find an inherited method for function ‘columns’ for signature ‘"AffyGenePDInfo”’
> >> 
> >> I think getMainProbes vs featureNames result in different format of output so maybe therefore my reasoning is wrong when I want to obtain symbols.
> >> Also what type of annotation would you use. pd.mogene.1.0.st.v1 or mogene10sttranscriptcluster.db?
> > 
> > I gave you a suggestion previously that you shouldn't be using getSYMBOL(), or lookUp() or any of the old-style annotation functions. That suggestion still holds! Use select() instead!
> > 
> > Also, pd.mogene.1.0.st.v1 isn't an annotation package. It is similar in spirit to the cdf packages that you use with the affy package, and is used to map probes to probesets, among other things.
> > 
> > The annotation package for this array, when summarized at the 'core' level (which is the default for oligo::rma()) is the mogene10sttranscriptcluster.db package. Refer to my previous email to see how to use this package to annotate your data.
> > 
> > Best,
> > 
> > Jim
> > 
> > 
> >> 
> >> I will be grateful if you can give me some suggestions.
> >> 
> >> Thanks,
> >> 
> >> Jakub
> >> 
> >> 
> >> 
> > 
> > -- 
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140722/d9109e0d/attachment.pl>


More information about the Bioconductor mailing list