[BioC] quantile robust and RMA in xps

cstrato cstrato at aon.at
Mon May 25 21:17:53 CEST 2009


Dear Mayte

I must admit that it may be confusing, so I need to update the help files.

Please note that "bgcorrect()" is a general function. There are specific 
functions for different background methods such as "bgcorrect.rma()" and 
"bgcorrect.mas5()" (see ?bgcorrect). Each of these methods has parameter 
"select" to select the probes to be used for background computation.

For expression arrays you can select c("pmonly", "mmonly", "both").
For whole genome arrays you can select only "antigenomic".
For exon arrays you can select c("antigenomic", "genomic")..

Thus for "bgcorrect.mas5()" parameter "select" tells the function which 
probes to be used for  background computation.

The rma background is special, since rma normally uses PM probes 
("pmonly") for background computation. Thus in this case I am using 
select="antigenomic" only to indicate that a whole genome or exon array 
is used. However, in the case of MM probes ("mmonly") parameter "select" 
tells the function to use "antigenomic" probes as MM probes.

Here are three examples how to use function "bgcorrect()" to compute the 
background:

1. Expression array, PM probes are used for background computation:
 > bg.rma <- bgcorrect (data, "tmp_bg2", method="rma", exonlevel="", 
select="none", option="pmonly:epanechnikov", params=c(16384))

2. Whole genome array, PM probes are used for background computation:
 > bg.rma <- bgcorrect (data, "tmp_bg2", method="rma", 
exonlevel="core+affx", select="antigenomic", 
option="pmonly:epanechnikov", params=c(16384))

Please note that in this case "core+affx" probes will be used for 
background computation, and "antigenomic" is only an indicator to use 
whole genome or exon arrays.

3. Whole genome array, antigenomic MM probes are used for background 
computation:
 > bg.rma <- bgcorrect (data, "tmp_bg2", method="rma", 
exonlevel="core+affx", select="antigenomic", 
option="mmonly:epanechnikov", params=c(16384))

In this case, "antigenomic" probes will be used for background 
computation, since "option" tells you to use "mmonly" probes.

I hope that I could explain how to use function "bgcorrect()".

Best regards
Christian


Mayte Suarez-Farinas wrote:
>>>
>
> Hi Christian.
> Tx for your answer. For my first question, I am sorry but I am still 
> confused, I dont know what the correct answer is. I am working with 
> HuGene 1_0 ST, measuring expression, I though I had to used the common 
> (default) RMA with PM's only. But it does not work. the option that 
> works with "antigenomic" is using MM's. Then, is this option right for 
> my case?
> best,
> Mayte
>
>>> 1. In background correction:
>>>
>>> the default RMA background is:
>>> data.bg.rma <- bgcorrect 
>>> (G1ST_data2,"tmp_bg",method="rma",exonlevel="core+affx",  
>>> select="none", option="pmonly:epanechnikov",params=c(16384))
>>> but I got the following error:
>>>
>>> g.rma <- 
>>> bgcorrect(G1ST_data2,"tmp_bg",method="rma",exonlevel="all",  
>>> select="none",     option="pmonly:epanechnikov",params=c(16384))
>>> Error in .local(object, ...) : error in function ‘BgCorrect’
>>> Opening file </Users/Mayte/Rlibrary/AffyDB/ROOTSchemes/ 
>>> Scheme_HuGene10stv1r4_na28.root> in <READ> mode...
>>> Creating new temporary file </Volumes/..../tmp_bg.root>...
>>> Preprocessing data using method <adjustbgrd>...
>>>     Background correcting raw data...
>>>        calculating background for <1_HuGene 1_0 ST_050409.cel>...
>>> Error: Number of PMs or MMs is zero.
>>> An error has occured: Need to abort current process.
>>>
>>
>> Please note that the default settings are always for expression 
>> arrays, so the error tells you that there are no MMs.
>>
>>> So, I try:
>>>
>>> data.bg.rma <- bgcorrect 
>>> (G1ST_data2,"tmp_bg2",method="rma",exonlevel="core+affx",  
>>> select="antigenomic", option="pmonly:epanechnikov",params=c(16384))
>>>
>>> which works OK but I dont know if it is OK.
>>>
>>
>>
>> This is the correct setting for whole genome and exon arrays. 
>> select="antigenomic" tells the program to use the antigenomic 
>> background probes as MMs, e.g. if you use option "mmonly" instead of 
>> "pmonly".
>>
>
>
>
>
> On May 23, 2009, at 11:42 AM, cstrato wrote:
>
>> Dear Mayte,
>>
>> Although not recommended, this is in principle possible, however your 
>> xps version is too old, you need version "xps_1.4.x", where I have 
>> modified method "intensity()<-" for these purposes, see the help file 
>> "?intensity".
>>
>> See my further comments below.
>>
>>
>> Mayte Suarez-Farinas wrote:
>>> Hi everybody.
>>>
>>> I am working with xps and I have to admit I still dont get all the  
>>> nuances, but I am trying my best.
>>> To summarize the data, I want to use rma but with an alteration to  
>>> the normalization step.
>>> so I need to do the 3 steps: bgcorrect, normalize and summarize. I  
>>> got two problems trying to do so:
>>>
>>> 1. In background correction:
>>>
>>> the default RMA background is:
>>> data.bg.rma <- bgcorrect 
>>> (G1ST_data2,"tmp_bg",method="rma",exonlevel="core+affx",  
>>> select="none", option="pmonly:epanechnikov",params=c(16384))
>>> but I got the following error:
>>>
>>> g.rma <- 
>>> bgcorrect(G1ST_data2,"tmp_bg",method="rma",exonlevel="all",  
>>> select="none",     option="pmonly:epanechnikov",params=c(16384))
>>> Error in .local(object, ...) : error in function ‘BgCorrect’
>>> Opening file </Users/Mayte/Rlibrary/AffyDB/ROOTSchemes/ 
>>> Scheme_HuGene10stv1r4_na28.root> in <READ> mode...
>>> Creating new temporary file </Volumes/..../tmp_bg.root>...
>>> Preprocessing data using method <adjustbgrd>...
>>>     Background correcting raw data...
>>>        calculating background for <1_HuGene 1_0 ST_050409.cel>...
>>> Error: Number of PMs or MMs is zero.
>>> An error has occured: Need to abort current process.
>>>
>>
>> Please note that the default settings are always for expression 
>> arrays, so the error tells you that there are no MMs.
>>
>>> So, I try:
>>>
>>> data.bg.rma <- bgcorrect 
>>> (G1ST_data2,"tmp_bg2",method="rma",exonlevel="core+affx",  
>>> select="antigenomic", option="pmonly:epanechnikov",params=c(16384))
>>>
>>> which works OK but I dont know if it is OK.
>>>
>>
>>
>> This is the correct setting for whole genome and exon arrays. 
>> select="antigenomic" tells the program to use the antigenomic 
>> background probes as MMs, e.g. if you use option "mmonly" instead of 
>> "pmonly".
>>
>>
>>> After that I want to use normalize.quantiles.robust function from  
>>> affy (is not available in xps)
>>> so I did:
>>>
>>> data.bg.rma<-attachInten(data.bg.rma)
>>> data.int<-intensity(data.bg.rma)
>>> detach(package:xps)
>>> library(affy)
>>> data.int.norm<-normalize.quantiles.robust(as.matrix(data.int[,-c 
>>> (1,2)]),n.remove=5,remove.extreme='both')
>>>
>>
>> In version R-2.9.0 which I am using, this function has moved to 
>> package "preprocessCore" but it seems not to work:
>>
>> library(preprocessCore)
>> data.int.norm <- 
>> normalize.quantiles.robust(as.matrix(data.int[,-c(1,2)]), n.remove=1, 
>> remove.extreme='both')
>>
>> I get the following error message:
>> Error in normalize.quantiles.robust(as.matrix(data.int[, -c(1, 2)]), 
>> n.remove = 1, :
>> VECTOR_ELT() can only be applied to a 'list', not a 'character
>>
>> Thus to simulate your setting I use function "normalize.quantiles" 
>> and delete one sample by hand:
>>
>> data.int.norm <- normalize.quantiles(as.matrix(data.int[,-c(1,2)]))
>> data.int.norm <- data.int.norm[,-4]
>> colnames(data.int.norm) <- 
>> c("Breast01","Breast02","Breast03","Prostate02","Prostate03")
>>
>> Note that (at least for me) the output is a matrix w/o column names, 
>> thus you need to set the correct column names manually.
>> (In my example I am using the breast/prostate triplicates from the 
>> Affy dataset.)
>>
>>
>>> which shows that the data is normalized. Then I have to update the  
>>> intensitities in the xps object data.bg.rma,
>>> which I did and after
>>>
>>> library(xps)
>>> str(data.int)
>>> data.int[,-c(1,2)]<-data.int.norm
>>> intensity(data.bg.rma)<-data.int
>>> boxplot(data.bg.rma)              #boxplot is OK
>>>
>>
>> The new replacement method "intensity()<-" has an option to create a 
>> new ROOT file (see?intensity), thus you need to do:
>>
>> library(xps)
>> str(data.int)
>>
>> data.int.norm <- as.data.frame(cbind(data.int[,c(1,2)],data.int.norm))
>>
>> Here you see that I added the (x,y) coordinates, but it is up to you 
>> to make sure that the order is correct.
>> I am using cbind() to prevent cycling of the samples, which is what I 
>> get when using "data.int[,-c(1,2)]".
>>
>> Now I can use the replacement method:
>>
>> intensity(data.bg.rma, "tmp_int2", verbose=TRUE) <- data.int.norm
>> str(data.bg.rma)
>> boxplot(data.bg.rma) #boxplot is OK
>>
>> Please note that this will take some time since the 
>> background-corrected intensities will first be saved as CEL-files 
>> which are then imported into the new ROOT file "tmp_int2_cel.root".
>>
>>
>>> The problem comes when I sumarized the resulting data using median  
>>> polish,
>>> the resulting data is not normalized:
>>>
>>> data.mp.rma <- 
>>> summarize.rma(data.bg.rma,"tmp_sum_rma",exonlevel="core +affx")
>>> boxplot(data.mp.rma)    #boxplot is not OK.
>>>
>>
>> Now you can summarize the data using xps, but you need to replace the 
>> setname first:
>>
>> setName(data.bg.rma) <- "DataSet"
>> data.mp.rma <- summarize.rma(data.bg.rma, "tmp_sum_rma", 
>> exonlevel="core+affx")
>> boxplot(data.mp.rma) #boxplot is now OK.
>>
>> I hope this helps.
>> Best regards
>> Christian
>>
>>
>>> I dont know if I make a mistake specially in updating the 
>>> intensities  after the normalization step. I will really appreciate 
>>> any insight on  this. Below is my session info...
>>>
>>>
>>>  > sessionInfo()
>>> R version 2.8.1 (2008-12-22)
>>> i386-apple-darwin8.11.1
>>>
>>> locale:
>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>>   [1] grid      splines   tools     stats     graphics  grDevices  
>>> utils     datasets  methods   base
>>>
>>> other attached packages:
>>>   [1] xps_1.2.10                affy_1.20.2                
>>> arrayQualityMetrics_1.8.1 marray_1.20.0              
>>> latticeExtra_0.5-4        vsn_3.8.0
>>>   [7] beadarray_1.10.0          sma_0.5.15                 
>>> hwriter_1.0               affycoretools_1.14.1       
>>> annaffy_1.14.0            KEGG.db_2.2.5
>>> [13] biomaRt_1.16.0            GOstats_2.8.0              
>>> Category_2.8.4            RBGL_1.18.0                
>>> GO.db_2.2.5               RSQLite_0.7-1
>>> [19] DBI_0.2-4                 graph_1.20.0               
>>> limma_2.16.5              affyQCReport_1.20.0        
>>> geneplotter_1.20.0        annotate_1.20.1
>>> [25] AnnotationDbi_1.5.18      lattice_0.17-17            
>>> RColorBrewer_1.0-2        affyPLM_1.18.1             
>>> preprocessCore_1.4.0      xtable_1.5-4
>>> [31] simpleaffy_2.18.0         gcrma_2.14.1               
>>> matchprobes_1.14.1        genefilter_1.22.0          
>>> survival_2.34-1           Biobase_2.2.2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] GSEABase_1.4.0     KernSmooth_2.22-22 RCurl_0.94-1        
>>> XML_2.1-0          affyio_1.10.1      cluster_1.11.11
>>>
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>>   
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>



More information about the Bioconductor mailing list