[BioC] vsn preprocess with oligo package

Fri Jan 11 18:46:24 CET 2013

Hi Jose,

On 1/11/2013 12:05 PM, José LÓPEZ wrote:
> Hi,
>
> I have one more question. The GeneFeatureSet has 1102500 features 
> while the vsn2 is done on 899636 features no matter whether target in 
> pm is "core" (default method) or "probeset". Why vsn2 does not use all 
> the features? Is that correct/normal? After sumarization, as expected, 
> I have 35556 features. It is possible that the normalization on only 
> part of the features can introduce some kind of bias in the 
> summarization process (899636 in stead of 1102500)?

I think you misunderstand a basic tenet of Affymetrix arrays. Not all 
features on an Affy array are used to measure expression of transcript. 
There are for instance thousands of features that go all the way around 
the outside of the array (the oligo-dT features) that are there only to 
help the scanner align itself to the chip. There are also two big blocks 
of features in the middle that are not measuring transcript either.

You can see this if you do something like

tmp <- log2(as.numeric(exprs(raw([,1]))))
geom <- geometry(getPD(raw))
## convert back to a matrix
tmp <- matrix(tmp, ncol = geom[1], nrow = geom[2])
## reorder because image() is weird
tmp <- as.matrix(rev(as.data.frame(tmp)))
image(tmp[1:100,1:100])

This shows the (still transposed) top left corner of the chip. The 
checkerboard in the corner, and all the features along the top and side 
are oligo-dT probes used to align. The chip name is made up of oligo-dT 
features (and blanks) as well and there primarily I suppose to look cool.

If you just do image(tmp), you will see the big blocks in the middle of 
the array.

Does that help?

Best,

Jim

>
> I think the alternative to this option is i.e. make the CDF file and 
> environment (to avoid the unofficial CDF) and make vsnrma in affy but, 
> since oligo was designed ad hoc (to analyze Gene and Exon affymetrix 
> arrays), I though it makes sense to try to find your help to combine 
> vsn and oligo.
>
> Thank you for your answer,
>
>  Jose
>
> El ene 11, 2013, a las 5:42 p.m., José LÓPEZ escribió:
>
>> Yes, the exprs() did the job and it is allowing me to combine vsn 
>> with oligo and Gene ST arrays.
>>
>> Thank you again for your kind help,
>>
>> Best,
>>
>>  Jose
>>
>>
>> > library(limma)
>> > library(oligo)
>> > Data=read.celfiles(list.celfiles())
>> Loading required package: pd.mogene.1.0.st.v1
>> Loading required package: RSQLite
>> Loading required package: DBI
>> Platform design info loaded.
>> Reading in : ABRNA1.CEL
>> Reading in : ABRNA2.CEL
>> Reading in : ABRNA3.CEL
>> Reading in : ABRNA4.CEL
>> Reading in : ABRNA5.CEL
>> Reading in : ABRNA6.CEL
>> > Data
>> GeneFeatureSet (storageMode: lockedEnvironment)
>> assayData: 1102500 features, 6 samples
>>   element names: exprs
>> protocolData
>>   rowNames: ABRNA1.CEL ABRNA2.CEL ... ABRNA6.CEL (6 total)
>>   varLabels: exprs dates
>>   varMetadata: labelDescription channel
>> phenoData
>>   rowNames: ABRNA1.CEL ABRNA2.CEL ... ABRNA6.CEL (6 total)
>>   varLabels: index
>>   varMetadata: labelDescription channel
>> featureData: none
>> experimentData: use 'experimentData(object)'
>> Annotation: pd.mogene.1.0.st.v1
>> > class(Data)
>> [1] "GeneFeatureSet"
>> attr(,"package")
>> [1] "oligoClasses"
>> > raw=backgroundCorrect(Data,"rma")
>> Background correcting... OK
>> > pms=pm(raw)
>> > head(pms)
>>      ABRNA1.CEL ABRNA2.CEL ABRNA3.CEL ABRNA4.CEL ABRNA5.CEL ABRNA6.CEL
>> 2106   7.853059   7.635413   5.221570   5.160448   5.978796   5.767533
>> 2107   5.990083   4.764031   4.659925   5.160448   7.067603   5.903628
>> 2108   4.867173   4.587342   5.023095   4.762256   5.978796   6.045105
>> 2109  10.657556   8.827158   7.679404   6.124607  11.523816   6.045105
>> 2110  12.538396  13.906774   6.147958   5.860049  38.764281   6.843026
>> 2111  10.241785   6.356910   4.836138   5.160448   7.067603   7.405633
>> > class(pms)
>> [1] "matrix"
>> > pmsVSN=vsn::vsnMatrix(pms)
>> vsn2: 899636 x 6 matrix (1 stratum). Please use 'meanSdPlot' to 
>> verify the fit.
>> > class(pmsVSN)
>> [1] "vsn"
>> attr(,"package")
>> [1] "vsn"
>> > pmsVSN
>> vsn object for n=899636 features and d=6 samples.
>> sigsq=0.1
>> hx: 899636 x 6 matrix.
>> > pm(raw) <- exprs(pmsVSN)
>> > rm(pms, pmsVSN)
>> > ls()
>> [1] "Data" "raw"
>> > raw
>> GeneFeatureSet (storageMode: lockedEnvironment)
>> assayData: 1102500 features, 6 samples
>>   element names: exprs
>> protocolData
>>   rowNames: ABRNA1.CEL ABRNA2.CEL ... ABRNA6.CEL (6 total)
>>   varLabels: exprs dates
>>   varMetadata: labelDescription channel
>> phenoData
>>   rowNames: ABRNA1.CEL ABRNA2.CEL ... ABRNA6.CEL (6 total)
>>   varLabels: index
>>   varMetadata: labelDescription channel
>> featureData: none
>> experimentData: use 'experimentData(object)'
>> Annotation: pd.mogene.1.0.st.v1
>> > eset=rma(raw, background=FALSE,normalize=FALSE)
>> Calculating Expression
>> > eset
>> ExpressionSet (storageMode: lockedEnvironment)
>> assayData: 35556 features, 6 samples
>>   element names: exprs
>> protocolData
>>   rowNames: ABRNA1.CEL ABRNA2.CEL ... ABRNA6.CEL (6 total)
>>   varLabels: exprs dates
>>   varMetadata: labelDescription channel
>> phenoData
>>   rowNames: ABRNA1.CEL ABRNA2.CEL ... ABRNA6.CEL (6 total)
>>   varLabels: index
>>   varMetadata: labelDescription channel
>> featureData: none
>> experimentData: use 'experimentData(object)'
>> Annotation: pd.mogene.1.0.st.v1
>>
>> El ene 11, 2013, a las 5:07 p.m., James W. MacDonald escribió:
>>
>>> Hi Jose,
>>>
>>> On 1/11/2013 10:31 AM, José LÓPEZ wrote:
>>>> Dear Jim,
>>>>
>>>> Thank you for the advise on the background correction step.
>>>> I already tryed before the whole Benilton's code but it doesn't 
>>>> work at the following step.
>>>>
>>>>> pm(raw)<- pmVSN
>>>> Error: object 'pmVSN' not found
>>>>
>>>> May I ask you what this step is doing? Does it replace the pm 
>>>> matrix in the raw ExpressionFeatureSet by the normalized one?
>>>
>>> Exactly. But the 'pm <-'  function expects to be fed a matrix, and 
>>> the pmsVSN isn't a matrix. Instead, it is a 'vsn' object, which is 
>>> related to an ExpressionSet object. So you can extract the matrix of 
>>> normalized data as usual, with exprs():
>>>
>>> pm(raw) <- exprs(pmsVSN)
>>>
>>> and there was another error in my code, as summarize() won't work on 
>>> a GeneFeatureSet object. Instead, you want to use rma():
>>>
>>> eset <- rma(raw, normalize = FALSE, background = FALSE)
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>>
>>>> Thank you in advance for your time and your kind help,
>>>>
>>>> Jose
>>>>
>>>>> library(limma)
>>>>> library(oligo)
>>>>> Data=read.celfiles(list.celfiles())
>>>> Loading required package: pd.mogene.1.0.st.v1
>>>> Loading required package: RSQLite
>>>> Loading required package: DBI
>>>> Platform design info loaded.
>>>> Reading in : ABRNA1.CEL
>>>> Reading in : ABRNA2.CEL
>>>> Reading in : ABRNA3.CEL
>>>> Reading in : ABRNA4.CEL
>>>> Reading in : ABRNA5.CEL
>>>> Reading in : ABRNA6.CEL
>>>>> pms=pm(Data)
>>>>> raw=backgroundCorrect(Data,"rma")
>>>> Background correcting... OK
>>>>> pms=pm(raw)
>>>>> pmsVSN=vsn::vsnMatrix(pms)
>>>> vsn2: 899636 x 6 matrix (1 stratum). Please use 'meanSdPlot' to 
>>>> verify the fit.
>>>>> pm(raw)<- pmVSN
>>>> Error: object 'pmVSN' not found
>>>>> pm(raw)<- pmsVSN
>>>> Error in function (classes, fdef, mtable)  :
>>>>   unable to find an inherited method for function ‘pm<-’ for 
>>>> signature ‘"GeneFeatureSet", "missing", "missing", "vsn"’
>>>>> pmsVSN
>>>> vsn object for n=899636 features and d=6 samples.
>>>> sigsq=0.1
>>>> hx: 899636 x 6 matrix.
>>>>> head(pms)
>>>>      ABRNA1.CEL ABRNA2.CEL ABRNA3.CEL ABRNA4.CEL ABRNA5.CEL ABRNA6.CEL
>>>> 2106   7.853059   7.635413   5.221570   5.160448   5.978796   5.767533
>>>> 2107   5.990083   4.764031   4.659925   5.160448   7.067603   5.903628
>>>> 2108   4.867173   4.587342   5.023095   4.762256   5.978796   6.045105
>>>> 2109  10.657556   8.827158   7.679404   6.124607  11.523816   6.045105
>>>> 2110  12.538396  13.906774   6.147958   5.860049  38.764281   6.843026
>>>> 2111  10.241785   6.356910   4.836138   5.160448   7.067603   7.405633
>>>>> class(pms)
>>>> [1] "matrix"
>>>>> ls()
>>>> [1] "Data"   "pms"    "pmsVSN" "raw"
>>>>
>>>>> sessionInfo()
>>>> R version 2.15.2 (2012-10-26)
>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> other attached packages:
>>>> [1] pd.mogene.1.0.st.v1_3.8.0 RSQLite_0.11.2            DBI_0.2-5 
>>>>                 oligo_1.22.0
>>>> [5] Biobase_2.18.0            oligoClasses_1.20.0 
>>>>       BiocGenerics_0.4.0        limma_3.14.3
>>>>
>>>> loaded via a namespace (and not attached):
>>>>  [1] affxparser_1.30.0     affy_1.36.0           affyio_1.26.0 
>>>>         BiocInstaller_1.8.3   Biostrings_2.26.2
>>>>  [6] bit_1.1-9             codetools_0.2-8       ff_2.2-10 
>>>>             foreach_1.4.0         GenomicRanges_1.10.5
>>>> [11] grid_2.15.2           IRanges_1.16.4        iterators_1.0.6 
>>>>       lattice_0.20-10       parallel_2.15.2
>>>> [16] preprocessCore_1.20.0 splines_2.15.2        stats4_2.15.2 
>>>>         vsn_3.26.0            zlibbioc_1.4.0
>>>>
>>>>
>>>> *****************************************************
>>>> José P. LÓPEZ-ATAYALA
>>>> Instituto de Neurociencias
>>>> CSIC - UMH
>>>> Avda. D. Santiago Ramón y Cajal, S/N
>>>> E-03550, Sant Joan d'Alacant
>>>> Alicante, Spain
>>>> jose.lopez at umh.es <mailto:jose.lopez at umh.es>
>>>> http://in.umh.es/grupos-detalle.aspx?grupo=30
>>>> (34) 965 919 531
>>>>
>>>> El ene 11, 2013, a las 3:59 p.m., James W. MacDonald escribió:
>>>>
>>>>> Hi Jose,
>>>>>
>>>>> Let's say you followed Benilton's code from 
>>>>> https://stat.ethz.ch/pipermail/bioconductor/2010-June/033936.html
>>>>>
>>>>> library(oligo)
>>>>> cels = list.celfiles()
>>>>> raw = read.celfiles(cels)
>>>>> raw = backgroundCorrect(raw, "rma") ## I added this - you might 
>>>>> want BG correction
>>>>> pms = pm(raw)
>>>>> pmsVSN = vsn::vsnMatrix(pms)
>>>>> pm(raw)<- pmVSN
>>>>> rm(pms, pmsVSN)
>>>>>
>>>>> you now have an ExpressionFeatureSet with normalized data that you 
>>>>> want to summarize. You can then
>>>>>
>>>>> eset<- summarize(raw, method = "medianpolish")
>>>>>
>>>>> See
>>>>>
>>>>> ?summarizationMethods
>>>>>
>>>>> for more information.
>>>>>
>>>>> Best,
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>> On 1/11/2013 6:17 AM, José LÓPEZ wrote:
>>>>>> Dear Benilton,
>>>>>>
>>>>>> I am using oligo for Mouse Gene 1.0ST arrays. In addition to RMA, 
>>>>>> I would also like to pre-process with vsn. I have seen previous 
>>>>>> threads related to this question in the past, 
>>>>>> (https://stat.ethz.ch/pipermail/bioconductor/2010-January/031100.html, 
>>>>>> https://stat.ethz.ch/pipermail/bioconductor/2010-June/033936.html), 
>>>>>> but unfortunately, I am not bioinformatician and, although I read 
>>>>>> oligo and vsn manuals, it is not easy to me to follow up to 
>>>>>> summarize the vsn object.
>>>>>> May you (or someone else) please, give me some additional clue to 
>>>>>> sumarize the vsn object using the oligo package.
>>>>>>
>>>>>> Thank you very much in advance for your time and your kind help,
>>>>>>
>>>>>>  Jose LOPEZ
>>>>>>
>>>>>> **************************
>>>>>>
>>>>>>> list.files()
>>>>>> [1] "ABRNA1.CEL"                          "ABRNA2.CEL" 
>>>>>>                          "ABRNA3.CEL"
>>>>>> [4] "ABRNA4.CEL"                          "ABRNA5.CEL" 
>>>>>>                          "ABRNA6.CEL"
>>>>>> [7] "Limma_FilterBefore_H2BGFP_jla_vsn.R"
>>>>>>> library(limma)
>>>>>>> library(oligo)
>>>>>> Loading required package: BiocGenerics
>>>>>>
>>>>>> Attaching package: ‘BiocGenerics’
>>>>>>
>>>>>> The following object(s) are masked from ‘package:stats’:
>>>>>>
>>>>>>     xtabs
>>>>>>
>>>>>> The following object(s) are masked from ‘package:base’:
>>>>>>
>>>>>>     anyDuplicated, cbind, colnames, duplicated, eval, Filter, 
>>>>>> Find, get, intersect, lapply, Map, mapply,
>>>>>>     mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, 
>>>>>> rbind, Reduce, rep.int, rownames, sapply,
>>>>>>     setdiff, table, tapply, union, unique
>>>>>>
>>>>>> Loading required package: oligoClasses
>>>>>> Loading package bit 1.1-9
>>>>>> package:bit (c) 2008-2012 Jens Oehlschlaegel (GPL-2)
>>>>>> creators: bit bitwhich
>>>>>> coercion: as.logical as.integer as.bit as.bitwhich which
>>>>>> operator: !&   | xor != ==
>>>>>> querying: print length any all min max range sum summary
>>>>>> bit access: length<- [ [<- [[ [[<-
>>>>>> for more help type ?bit
>>>>>> Loading package ff2.2-10
>>>>>> - 
>>>>>> getOption("fftempdir")=="/var/folders/U+/U+SFMmqcEbKkSysJQ3OYbk+++TQ/-Tmp-//RtmpKgQzWD"
>>>>>>
>>>>>> - getOption("ffextension")=="ff"
>>>>>>
>>>>>> - getOption("ffdrop")==TRUE
>>>>>>
>>>>>> - getOption("fffinonexit")==TRUE
>>>>>>
>>>>>> - getOption("ffpagesize")==65536
>>>>>>
>>>>>> - getOption("ffcaching")=="mmnoflush"  -- consider "ffeachflush" 
>>>>>> if your system stalls on large writes
>>>>>>
>>>>>> - getOption("ffbatchbytes")==16777216 -- consider a different 
>>>>>> value for tuning your system
>>>>>>
>>>>>> - getOption("ffmaxbytes")==536870912 -- consider a different 
>>>>>> value for tuning your system
>>>>>>
>>>>>> Welcome to oligoClasses version 1.20.0
>>>>>> Loading required package: Biobase
>>>>>> Welcome to Bioconductor
>>>>>>
>>>>>>     Vignettes contain introductory material; view with 
>>>>>> 'browseVignettes()'. To cite Bioconductor, see
>>>>>>     'citation("Biobase")', and for packages 'citation("pkgname")'.
>>>>>>
>>>>>> ========================================================================================================================
>>>>>> Welcome to oligo version 1.22.0
>>>>>> ========================================================================================================================
>>>>>>
>>>>>> Attaching package: ‘oligo’
>>>>>>
>>>>>> The following object(s) are masked from ‘package:limma’:
>>>>>>
>>>>>>     backgroundCorrect
>>>>>>
>>>>>>> Data=read.celfiles(list.celfiles())
>>>>>> Loading required package: pd.mogene.1.0.st.v1
>>>>>> Loading required package: RSQLite
>>>>>> Loading required package: DBI
>>>>>> Platform design info loaded.
>>>>>> Reading in : ABRNA1.CEL
>>>>>> Reading in : ABRNA2.CEL
>>>>>> Reading in : ABRNA3.CEL
>>>>>> Reading in : ABRNA4.CEL
>>>>>> Reading in : ABRNA5.CEL
>>>>>> Reading in : ABRNA6.CEL
>>>>>>> pms=pm(Data)
>>>>>>> class(pms)
>>>>>> [1] "matrix"
>>>>>>> head(pms)
>>>>>>      ABRNA1.CEL ABRNA2.CEL ABRNA3.CEL ABRNA4.CEL ABRNA5.CEL 
>>>>>> ABRNA6.CEL
>>>>>> 2106         46         43         36         36         37 
>>>>>>         33
>>>>>> 2107         38         32         33         36         43 
>>>>>>         34
>>>>>> 2108         31         31         35         34         37 
>>>>>>         35
>>>>>> 2109         54         46         45         40         58 
>>>>>>         35
>>>>>> 2110         58         55         40         39         94 
>>>>>>         40
>>>>>> 2111         53         39         34         36         43 
>>>>>>         43
>>>>>>> pmsVSN=vsn::vsnMatrix(pms)
>>>>>> vsn2: 899636 x 6 matrix (1 stratum). Please use 'meanSdPlot' to 
>>>>>> verify the fit.
>>>>>>> class(pmsVSN)
>>>>>> [1] "vsn"
>>>>>> attr(,"package")
>>>>>> [1] "vsn"
>>>>>>> pmsVSN
>>>>>> vsn object for n=899636 features and d=6 samples.
>>>>>> sigsq=0.026
>>>>>> hx: 899636 x 6 matrix.
>>>>>>> eset=rma(pmsVSN, background=FALSE,normalize=FALSE)
>>>>>> Error in function (classes, fdef, mtable)  :
>>>>>>   unable to find an inherited method for function ‘rma’ for 
>>>>>> signature ‘"vsn"’
>>>>>>
>>>>>>> library(vsn)
>>>>>>> meanSdPlot(pmsVSN)
>>>>>> KernSmooth 2.23 loaded
>>>>>> Copyright M. P. Wand 1997-2009
>>>>>>
>>>>>>> sessionInfo()
>>>>>> R version 2.15.2 (2012-10-26)
>>>>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>
>>>>>> other attached packages:
>>>>>> [1] vsn_3.26.0                pd.mogene.1.0.st.v1_3.8.0 
>>>>>> RSQLite_0.11.2            DBI_0.2-5
>>>>>> [5] oligo_1.22.0              Biobase_2.18.0 
>>>>>>            oligoClasses_1.20.0       BiocGenerics_0.4.0
>>>>>> [9] limma_3.14.3
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>>  [1] affxparser_1.30.0     affy_1.36.0           affyio_1.26.0 
>>>>>>         BiocInstaller_1.8.3   Biostrings_2.26.2
>>>>>>  [6] bit_1.1-9             codetools_0.2-8       ff_2.2-10 
>>>>>>             foreach_1.4.0         GenomicRanges_1.10.5
>>>>>> [11] grid_2.15.2           IRanges_1.16.4        iterators_1.0.6 
>>>>>>       KernSmooth_2.23-8     lattice_0.20-10
>>>>>> [16] parallel_2.15.2       preprocessCore_1.20.0 splines_2.15.2 
>>>>>>        stats4_2.15.2         tools_2.15.2
>>>>>> [21] zlibbioc_1.4.0
>>>>>>
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives: 
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>> -- 
>>>>> James W. MacDonald, M.S.
>>>>> Biostatistician
>>>>> University of Washington
>>>>> Environmental and Occupational Health Sciences
>>>>> 4225 Roosevelt Way NE, # 100
>>>>> Seattle WA 98105-6099
>>>>>
>>>
>>> -- 
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099