[BioC] Analysis of Affymetrix Mouse Gene 2.0 ST arrays

Thu Mar 7 16:36:28 CET 2013

Thanks Jim. Of course the annotation package does not contain probe --> probe set information. What was I thinking?!??

What I had not realized was that I needed to build the pd.mogene.2.0.st package myself first, because it also does not exist on Bioconductor. So I just downloaded all the required files from Affy, but again I am stuck with an error message I don't understand... what is the coreMPS file that gives me the error?

> library(pdInfoBuilder)
> baseDir <- "/Users/naxerova/Documents/xxx"
> (pgf <- list.files(baseDir, pattern = ".pgf",
+ full.names = TRUE))
[1] "/Users/naxerova/Documents/xxx/MoGene-2_0-st.pgf"
> (clf <- list.files(baseDir, pattern = ".clf",
+ full.names = TRUE))
[1] "/Users/naxerova/Documents/xxx/MoGene-2_0-st.clf"
> (prob <- list.files(baseDir, pattern = ".probeset.csv",
+ full.names = TRUE))
[1] "/Users/naxerova/Documents/xxx/MoGene-2_0-st-v1.na33.mm10.probeset.csv"
> seed <- new("AffyGenePDInfoPkgSeed",
+ pgfFile = pgf, clfFile = clf,
+ probeFile = prob, author = "Kamila Naxerova",
+ email = "naxerova at fas.harvard.edu",
+ biocViews = "AnnotationData",
+ organism = "Mouse", species = "Mus Musculus")
> makePdInfoPackage(seed, destDir = ".")
===============================================================================================================================================
Building annotation package for Affymetrix Gene ST Array
PGF.........: MoGene-2_0-st.pgf
CLF.........: MoGene-2_0-st.clf
Probeset....: MoGene-2_0-st-v1.na33.mm10.probeset.csv
Transcript..: TheTranscriptFile
Core MPS....: coreMps
===============================================================================================================================================
Parsing file: MoGene-2_0-st.pgf... OK
Parsing file: MoGene-2_0-st.clf... OK
Creating initial table for probes... OK
Creating dictionaries... OK
Parsing file: MoGene-2_0-st-v1.na33.mm10.probeset.csv... OK
Parsing file: coreMps... Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open file 'coreMps': No such file or directory

On Mar 7, 2013, at 10:06 AM, "James W. MacDonald" <jmacdon at uw.edu> wrote:

> Hi Kamila,
> 
> On 3/7/2013 9:54 AM, Naxerova, Kamila wrote:
>> Dear all,
>> 
>> I am afraid I have to ask for help with the Mouse Gene 2.0 ST annotation package one more time. It looked like I created it successfully, but when I try to use it to read in cel files with the oligo package, I get a cryptic error message. Any suggestions would be much appreciated!
> 
> You don't use the annotation package at this step. There are two 
> packages that are used for the analysis of this chip type. The first is 
> the pd.mogene.2.0.st.v1 package, which is used by oligo to map probes to 
> probesets when doing the normalization/summarization step. This package 
> will be automagically installed if you don't have it, so there is 
> nothing to be done at the first step but
> 
> abatch <- read.celfiles(list.celfiles())
> eset <- rma(abatch)
> 
> This will give you the summarized and normalized data at the transcript 
> level. You then will normally fit some model(s) using the modeling 
> package of your choice, and then might want to output a set of 
> significant genes, at which time you will use the 
> mogene20sttranscriptcluster.db package to map probeset IDs to gene 
> information.
> 
> Best,
> 
> Jim
> 
> 
>> 
>>> abatch<- read.celfiles(list.celfiles(),pkgname="mogene20sttranscriptcluster.db")
>> Platform design info loaded.
>> Reading in : xxx.CEL
>> Reading in : xxx.CEL
>> Reading in : xxx.CEL
>> [... more cel files listed]
>> 
>> Error in function (classes, fdef, mtable)  :
>>   unable to find an inherited method for function ‘kind’ for signature ‘"ChipDb"’
>> 
>> Thanks
>> Kamila
>> 
>> On Mar 6, 2013, at 6:16 PM, "Naxerova, Kamila"<naxerova at fas.harvard.edu>  wrote:
>> 
>>> Dear Christian and Jim,
>>> 
>>> many thanks to both of you for your explanations.
>>> 
>>> Your hard work paid off, and I have finally understood everything and managed to build my annotation package!!!! I wrote a little script similar to what Jim was suggesting, namely picking the first RefSeq-like thing I came across. Jim called it "naive" -- but I think there is no downside to this approach, right? I have looked at various examples in the Affy file for a long time, and simply picking the first Refseq ID seems to be kosher.
>>> 
>>> data<-read.csv("MoGene-transcript-noheader.csv",header=T,stringsAsFactors=F,sep=",")
>>> sdata<- data[,c(1,9)]
>>> 
>>> returnRef=function(x){
>>> 	refst<- strsplit(x,split="///")[[1]][grep("RefSeq",strsplit(x,split="///")[[1]])[1]]
>>> 	refid<- gsub(" ","",strsplit(refst,split="//")[[1]][1])
>>> 	return(refid)
>>> }
>>> 
>>> sdata$refseqids<- sapply(sdata[,2],returnRef)
>>> fdata<- sdata[,-2]
>>> write.table(fdata,"AnnotBuild.txt", sep="\t",quote=F,row.names=F,col.names=F)
>>> 
>>> library(AnnotationForge)
>>> library(mouse.db0)
>>> library(org.Mm.eg.db)
>>> makeDBPackage("MOUSECHIP_DB",
>>> affy=F,
>>> prefix="mogene20sttranscriptcluster",
>>> fileName="AnnotBuild.txt",
>>> outputDir = ".",
>>> version="2.11.1",
>>> baseMapType="refseq",
>>> manufacturer = "Affymetrix",
>>> chipName = "Mouse Gene 2.0 ST Array",
>>> manufacturerUrl = "http://www.affymetrix.com",
>>> author = "Kamila Naxerova",
>>> maintainer = "Kamila Naxerova<naxerova at fas.harvard.edu>")
>>> 
>>>> install.packages("mogene20sttranscriptcluster.db",repos=NULL, type="source")
>>> * installing *source* package ‘mogene20sttranscriptcluster.db’ ...
>>> ** R
>>> ** inst
>>> ** preparing package for lazy loading
>>> ** help
>>> *** installing help indices
>>> ** building package indices
>>> ** testing if installed package can be loaded
>>> *** arch - i386
>>> *** arch - x86_64
>>> 
>>> * DONE (mogene20sttranscriptcluster.db)
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>