[BioC] annotating microarray data with mogene10stv1

Jakub Stanislaw Nowak jakub.nowak at ed.ac.uk
Tue Jul 22 21:41:43 CEST 2014


Hi Jim,

Thanks for your suggestion. Somehow I overlooked the function select. Now I think I am getting closer.
I have a problem with applying select () to my probes. I think it  may be due to type of ID = probes value type which is ExpressionSet.

So first as explained before I generated the ID containing main probes from my dataset

> > ID <- getMainProbes(eset)
> > ID
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 28858 features, 6 samples 
>   element names: exprs 
> protocolData
>   rowNames: mock1 mock2 ... siLin28a2 (6 total)
>   varLabels: exprs dates
>   varMetadata: labelDescription channel
> phenoData
>   rowNames: mock1 mock2 ... siLin28a2 (6 total)
>   varLabels: index
>   varMetadata: labelDescription channel
> featureData: none
> experimentData: use 'experimentData(object)'
> Annotation: pd.mogene.1.0.st.v1 

Then I wanted to annotate using select() and I am getting this error.

> > tmp <- select(mogene10sttranscriptcluster.db, ID, c("SYMBOL","GENENAME","ENTREZID"))
> Error in .testForValidKeys(x, keys, keytype) : 
>   'keys' must be a character vector


However if I use ID which is generated with featureNames() the select() works but I think I am not removing control probes that you were describing before by applying this approach.

Is there a way that I can convert value which is of type ExpressionSet to a character type? Or alternatively what should I do make it work?

Many thanks,

Jakub

On 22 Jul 2014, at 17:21, James W. MacDonald <jmacdon at uw.edu> wrote:

> Hi Jakub,
> 
> Please don't take questions off-list (use 'Reply-all' when responding).
> 
> On 7/22/2014 12:06 PM, Jakub Stanislaw Nowak wrote:
>> Hi Jim,
>> 
>> I think I have couple follow up questions. As I got stuck trying using getMainProbes function.
>> As I am still a beginner with R my question might sound quite naive
>> 
>> 1. First question is about loading data using oligo package. Which approach would you use or they both give the same output?
>> 
>>>> celFiles<-list.celfiles()
>>>> mydata <- read.celfiles(celFiles)
>>> Platform design info loaded.
>>> Reading in : GSM910962.CEL
>>> Reading in : GSM910963.CEL
>>> Reading in : GSM910964.CEL
>>> Reading in : GSM910965.CEL
>>> Reading in : GSM910966.CEL
>>> Reading in : GSM910967.CEL
>> 
>> or
>> 
>>>> adf<-read.AnnotatedDataFrame("target.txt",row.names=1, header=TRUE, as.is=TRUE)
>>>> mydata2 <- read.celfiles(filenames=pData(adf)$FileName,phenoData=adf)
>>> Platform design info loaded.
>>> Reading in : GSM910962.CEL
>>> Reading in : GSM910963.CEL
>>> Reading in : GSM910964.CEL
>>> Reading in : GSM910965.CEL
>>> Reading in : GSM910966.CEL
>>> Reading in : GSM910967.CEL
>>> Warning message:
>>> In read.celfiles(filenames = pData(adf)$FileName, phenoData = adf) :
>>>   'channel' automatically added to varMetadata in phenoData.
> 
> There should be no difference between the two, other than the obvious difference in the phenoData slot.
> 
>> 
>> 2. how would use function getMainProbes
>> 
>> I tried this and I ended up getting an error
>> 
>>>> eset <- rma(mydata)
>>> Background correcting
>>> Normalizing
>>> Calculating Expression
>> 
>>>> ID <- getMainProbes(eset)
>>>> ID
>>> ExpressionSet (storageMode: lockedEnvironment)
>>> assayData: 28858 features, 6 samples
>>>   element names: exprs
>>> protocolData
>>>   rowNames: mock1 mock2 ... siLin28a2 (6 total)
>>>   varLabels: exprs dates
>>>   varMetadata: labelDescription channel
>>> phenoData
>>>   rowNames: mock1 mock2 ... siLin28a2 (6 total)
>>>   varLabels: index
>>>   varMetadata: labelDescription channel
>>> featureData: none
>>> experimentData: use 'experimentData(object)'
>>> Annotation: pd.mogene.1.0.st.v1
> 
> You didn't get an error. You were returned an ExpressionSet containing only the 28,858 main probes (you started with 35K or so, IIRC).
> 
>> 
>>>> symbol <- getSYMBOL(ID, "pd.mogene.1.0.st.v1")
>>> Error in unlist(lookUp(x, data, "SYMBOL")) :
>>>   error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in mget(x, envir = getAnnMap(what, chip = data, load = load), ifnotfound = NA) :
>>>   error in evaluating the argument 'envir' in selecting a method for function 'mget': Error in (function (classes, fdef, mtable)  :
>>>   unable to find an inherited method for function ‘columns’ for signature ‘"AffyGenePDInfo”’
>> 
>> I think getMainProbes vs featureNames result in different format of output so maybe therefore my reasoning is wrong when I want to obtain symbols.
>> Also what type of annotation would you use. pd.mogene.1.0.st.v1 or mogene10sttranscriptcluster.db?
> 
> I gave you a suggestion previously that you shouldn't be using getSYMBOL(), or lookUp() or any of the old-style annotation functions. That suggestion still holds! Use select() instead!
> 
> Also, pd.mogene.1.0.st.v1 isn't an annotation package. It is similar in spirit to the cdf packages that you use with the affy package, and is used to map probes to probesets, among other things.
> 
> The annotation package for this array, when summarized at the 'core' level (which is the default for oligo::rma()) is the mogene10sttranscriptcluster.db package. Refer to my previous email to see how to use this package to annotate your data.
> 
> Best,
> 
> Jim
> 
> 
>> 
>> I will be grateful if you can give me some suggestions.
>> 
>> Thanks,
>> 
>> Jakub
>> 
>> 
>> 
> 
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140722/98bea37e/attachment.pl>


More information about the Bioconductor mailing list