[BioC] Analysis of Affymetrix Mouse Gene 2.0 ST arrays

Thu Mar 7 16:06:50 CET 2013

Hi Kamila,

On 3/7/2013 9:54 AM, Naxerova, Kamila wrote:
> Dear all,
>
> I am afraid I have to ask for help with the Mouse Gene 2.0 ST annotation package one more time. It looked like I created it successfully, but when I try to use it to read in cel files with the oligo package, I get a cryptic error message. Any suggestions would be much appreciated!

You don't use the annotation package at this step. There are two 
packages that are used for the analysis of this chip type. The first is 
the pd.mogene.2.0.st.v1 package, which is used by oligo to map probes to 
probesets when doing the normalization/summarization step. This package 
will be automagically installed if you don't have it, so there is 
nothing to be done at the first step but

abatch <- read.celfiles(list.celfiles())
eset <- rma(abatch)

This will give you the summarized and normalized data at the transcript 
level. You then will normally fit some model(s) using the modeling 
package of your choice, and then might want to output a set of 
significant genes, at which time you will use the 
mogene20sttranscriptcluster.db package to map probeset IDs to gene 
information.

Best,

Jim

>
>> abatch<- read.celfiles(list.celfiles(),pkgname="mogene20sttranscriptcluster.db")
> Platform design info loaded.
> Reading in : xxx.CEL
> Reading in : xxx.CEL
> Reading in : xxx.CEL
> [... more cel files listed]
>
> Error in function (classes, fdef, mtable)  :
>    unable to find an inherited method for function ‘kind’ for signature ‘"ChipDb"’
>
> Thanks
> Kamila
>
> On Mar 6, 2013, at 6:16 PM, "Naxerova, Kamila"<naxerova at fas.harvard.edu>  wrote:
>
>> Dear Christian and Jim,
>>
>> many thanks to both of you for your explanations.
>>
>> Your hard work paid off, and I have finally understood everything and managed to build my annotation package!!!! I wrote a little script similar to what Jim was suggesting, namely picking the first RefSeq-like thing I came across. Jim called it "naive" -- but I think there is no downside to this approach, right? I have looked at various examples in the Affy file for a long time, and simply picking the first Refseq ID seems to be kosher.
>>
>> data<-read.csv("MoGene-transcript-noheader.csv",header=T,stringsAsFactors=F,sep=",")
>> sdata<- data[,c(1,9)]
>>
>> returnRef=function(x){
>> 	refst<- strsplit(x,split="///")[[1]][grep("RefSeq",strsplit(x,split="///")[[1]])[1]]
>> 	refid<- gsub(" ","",strsplit(refst,split="//")[[1]][1])
>> 	return(refid)
>> }
>>
>> sdata$refseqids<- sapply(sdata[,2],returnRef)
>> fdata<- sdata[,-2]
>> write.table(fdata,"AnnotBuild.txt", sep="\t",quote=F,row.names=F,col.names=F)
>>
>> library(AnnotationForge)
>> library(mouse.db0)
>> library(org.Mm.eg.db)
>> makeDBPackage("MOUSECHIP_DB",
>> affy=F,
>> prefix="mogene20sttranscriptcluster",
>> fileName="AnnotBuild.txt",
>> outputDir = ".",
>> version="2.11.1",
>> baseMapType="refseq",
>> manufacturer = "Affymetrix",
>> chipName = "Mouse Gene 2.0 ST Array",
>> manufacturerUrl = "http://www.affymetrix.com",
>> author = "Kamila Naxerova",
>> maintainer = "Kamila Naxerova<naxerova at fas.harvard.edu>")
>>
>>> install.packages("mogene20sttranscriptcluster.db",repos=NULL, type="source")
>> * installing *source* package ‘mogene20sttranscriptcluster.db’ ...
>> ** R
>> ** inst
>> ** preparing package for lazy loading
>> ** help
>> *** installing help indices
>> ** building package indices
>> ** testing if installed package can be loaded
>> *** arch - i386
>> *** arch - x86_64
>>
>> * DONE (mogene20sttranscriptcluster.db)
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099