[BioC] annotating microarray data with mogene10stv1

Tue Jul 22 18:21:46 CEST 2014

Hi Jakub,

Please don't take questions off-list (use 'Reply-all' when responding).

On 7/22/2014 12:06 PM, Jakub Stanislaw Nowak wrote:
> Hi Jim,
>
> I think I have couple follow up questions. As I got stuck trying using getMainProbes function.
> As I am still a beginner with R my question might sound quite naive
>
> 1. First question is about loading data using oligo package. Which approach would you use or they both give the same output?
>
>>> celFiles<-list.celfiles()
>>> mydata <- read.celfiles(celFiles)
>> Platform design info loaded.
>> Reading in : GSM910962.CEL
>> Reading in : GSM910963.CEL
>> Reading in : GSM910964.CEL
>> Reading in : GSM910965.CEL
>> Reading in : GSM910966.CEL
>> Reading in : GSM910967.CEL
>
> or
>
>>> adf<-read.AnnotatedDataFrame("target.txt",row.names=1, header=TRUE, as.is=TRUE)
>>> mydata2 <- read.celfiles(filenames=pData(adf)$FileName,phenoData=adf)
>> Platform design info loaded.
>> Reading in : GSM910962.CEL
>> Reading in : GSM910963.CEL
>> Reading in : GSM910964.CEL
>> Reading in : GSM910965.CEL
>> Reading in : GSM910966.CEL
>> Reading in : GSM910967.CEL
>> Warning message:
>> In read.celfiles(filenames = pData(adf)$FileName, phenoData = adf) :
>>    'channel' automatically added to varMetadata in phenoData.

There should be no difference between the two, other than the obvious 
difference in the phenoData slot.

>
> 2. how would use function getMainProbes
>
> I tried this and I ended up getting an error
>
>>> eset <- rma(mydata)
>> Background correcting
>> Normalizing
>> Calculating Expression
>
>>> ID <- getMainProbes(eset)
>>> ID
>> ExpressionSet (storageMode: lockedEnvironment)
>> assayData: 28858 features, 6 samples
>>    element names: exprs
>> protocolData
>>    rowNames: mock1 mock2 ... siLin28a2 (6 total)
>>    varLabels: exprs dates
>>    varMetadata: labelDescription channel
>> phenoData
>>    rowNames: mock1 mock2 ... siLin28a2 (6 total)
>>    varLabels: index
>>    varMetadata: labelDescription channel
>> featureData: none
>> experimentData: use 'experimentData(object)'
>> Annotation: pd.mogene.1.0.st.v1

You didn't get an error. You were returned an ExpressionSet containing 
only the 28,858 main probes (you started with 35K or so, IIRC).

>
>>> symbol <- getSYMBOL(ID, "pd.mogene.1.0.st.v1")
>> Error in unlist(lookUp(x, data, "SYMBOL")) :
>>    error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in mget(x, envir = getAnnMap(what, chip = data, load = load), ifnotfound = NA) :
>>    error in evaluating the argument 'envir' in selecting a method for function 'mget': Error in (function (classes, fdef, mtable)  :
>>    unable to find an inherited method for function ‘columns’ for signature ‘"AffyGenePDInfo”’
>
> I think getMainProbes vs featureNames result in different format of output so maybe therefore my reasoning is wrong when I want to obtain symbols.
> Also what type of annotation would you use. pd.mogene.1.0.st.v1 or mogene10sttranscriptcluster.db?

I gave you a suggestion previously that you shouldn't be using 
getSYMBOL(), or lookUp() or any of the old-style annotation functions. 
That suggestion still holds! Use select() instead!

Also, pd.mogene.1.0.st.v1 isn't an annotation package. It is similar in 
spirit to the cdf packages that you use with the affy package, and is 
used to map probes to probesets, among other things.

The annotation package for this array, when summarized at the 'core' 
level (which is the default for oligo::rma()) is the 
mogene10sttranscriptcluster.db package. Refer to my previous email to 
see how to use this package to annotate your data.

Best,

Jim

>
> I will be grateful if you can give me some suggestions.
>
> Thanks,
>
> Jakub
>
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099