[BioC] Issue with limma and normalization of Agilent data generated with a 20-bit scan

Mon Mar 15 23:54:00 CET 2010

Dear Peter,

The plots on the permalink website do not correspond to the R code in your 
email.  Apparently the plots are from marray package, produced by R code 
which you do not give.  I can't comment on the behaviour of someone else's 
package.

As I've already told you, normalizeWithinArrays() uses all the points, and 
having an A-value over 16 is not an issue.  However, the loess curve is 
designed to be quite stiff, and also to ignore outliers, and therefore the 
curve will not follow a highly localized J-curve at the right end of the 
range, which is what your MA-plot seems to show.  The loess curve is 
designed to follow the main trend, not local trends involving small 
proportions of the points.  If you believe that your platform has a 
different MvsA relationship for A>16 than for A<16, and this should be 
removed by the normalization curve, then you can do one of two things. 
One possibility is simply to make the curve more local by reducing the 
span paramater:

  normObj <- normalizeWithinArrays(bgObj, method="loess", span=0.1)

Choosing span small enough will certainly remove the J-curve.  However I 
don't recommend this approach, as it is not specific to A>16.  I recommend 
instead that you use the modifyWeights() function to give the spots with 
A>16 increased weights, so the loess curve will follow it more closely. 
You might find a combination of slightly reduced span and increased 
weights will work best.

BTW, I would recommend that you use spottypes to display control spots, 
and weights to downweight them in the loess normalization, rather than 
hacking NA values into your data.  It gives you more information and 
flexibility.

Best wishes
Gordon

On Mon, 15 Mar 2010, White, Peter wrote:

> Dear Gordon,
>
> The plots are visible in the blog view on gmane.org:
>
> http://permalink.gmane.org/gmane.science.biology.informatics.conductor/27731
>
> I thought you may be on to something with the weights but I tried it 
> with and without a flag function (also double checked the Agilent file 
> and the high intensity spots are not flagged). It really does look like 
> the loess is just not fitted beyond for elements with an A value > 16??? 
> These 20-bit scans from Agilent are quite new and I suspect most folks 
> with just use the Agilent normalized data rather than starting with the 
> raw data, so maybe this just hasn't been observed before now?
>
> Thanks,
>
> Peter
>
> Below is the code I used:
>
>   library(limma)
>    agilentFiles <- list.files(pattern="U")
>    rawObj <- read.maimages(agilentFiles,
>          columns = list(G = "gMedianSignal", Gb = "gBGMedianSignal",
>          R = "rMedianSignal", Rb = "rBGMedianSignal"),
>          annotation= c("ProbeName", "SystematicName","ControlType"))
> #Remove spike controls and remove background signals
>    bgObj <- rawObj
>    posControls <- grep(T,rawObj$genes$ControlType == 1)
>    bgObj$G[posControls,] <- NA
>    bgObj$R[posControls,] <- NA
>    bgObj$Gb <- bgObj$Rb <- NULL
> #Loess normalize
>    normObj <- normalizeWithinArrays(bgObj, method="loess", weights=NULL)
> #Plot MvA
>    for (i in 1:ncol(normObj)) {
>       figureName <- paste(i, " MvA Plots")
>       mat <- matrix(c(3,1,2),nrow=3,ncol=1)
>       layout(mat,heights=c(1,10,10))
>       plotMA(rawObj, array=i, main = "Pre-Normalization MvA",
>            ylim=c(-3.5,3.5), zero.weights=TRUE)
>            abline(0,0)
>       plotMA(normObj, array=i, main = "Normalized MvA",
>            ylim=c(-3.5,3.5), zero.weights=TRUE)
>            abline(0,0)
>            layout(1)
>       mtext(figureName, cex=1.25, line=3)
>       savePlot(filename=figureName, type=c("png"), device=dev.cur())
>    }
>
>> sessionInfo()
> R version 2.10.1 (2009-12-14)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] grDevices datasets  splines   graphics  stats     tcltk     utils
> [8] methods   base
>
> other attached packages:
> [1] limma_3.2.2     svSocket_0.9-48 TinnR_1.0.3     R2HTML_1.59-1
> [5] Hmisc_3.7-0     survival_2.35-9
>
> loaded via a namespace (and not attached):
> [1] cluster_1.12.1 grid_2.10.1    lattice_0.18-3 svMisc_0.9-56  tools_2.10.1
>
>> -----Original Message-----
>> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU]
>> Sent: Saturday, March 13, 2010 6:39 PM
>> To: White, Peter
>> Cc: Bioconductor mailing list
>> Subject: [BioC] Issue with limma and normalization of Agilent data
>> generated with a 20-bit scan
>>
>> Dear Peter,
>>
>> You can't send attachments to the Bioconductor mailing list, so I have
>> not
>> seen your plots.  However I am not aware of any issue such as you
>> describe.  The limma function normalizeWithinArrays includes all spots
>> in
>> the normalization, regardless of how large the A-value is.  You haven't
>> shown us any code, or any problem we can reproduce, so we can't tell
>> whether or not you're doing something wrong.  We don't know whether
>> you're
>> using probe weights, whether you've filtered control spots, etc etc.
>>
>> Best wishes
>> Gordon
>>
>>> Date: Fri, 12 Mar 2010 10:21:41 -0500
>>> From: "White, Peter" <Peter.White at nationwidechildrens.org>
>>> To: "'bioconductor at stat.math.ethz.ch'"
>>>     <bioconductor at stat.math.ethz.ch>
>>> Subject: [BioC] Issue with limma and normalization of Agilent data
>>>     generated with a 20-bit scan
>>> Content-Type: text/plain
>>>
>>> I have noticed an issue with the limma normalizeWithinArrays function 
>>> (and also with marray and maNorm). When normalizing two color data 
>>> generated with an Agilent 20-bt scanner it fails to normalize the
>> high
>>> intensity data (i.e. any points with an A value > 16). In our dataset
>> we
>>> have in excess of 400 elements with red and green intensities ranging
>>> from 65500 to 475100. When we loess normalize the data any points
>> beyond
>>> A=16 appear to be untouched by the normalization. If the attached
>>> figures come through this should be clear - when using maNorm and
>> maPlot
>>> it will plot the loess line and you can see it stop at 16.
>>>
>>> Is it possible for loess normalization to be extended to this higher
>>> intensity data? Or am I just doing something wrong?
>>>
>>> Thanks,
>>>
>>> Peter
>>>
>>>
>>> Peter White, Ph.D.
>>> Director, Biomedical Genomics Core<http://genomics.nchresearch.org/>
>>> Research Assistant Professor of Pediatrics
>>> The Research Institute at
>>> Nationwide Children's Hospital and
>>> The Ohio State University
>>>
>>> Mailing Address:
>>>
>>> The Research Institute at
>>> Nationwide Children's Hospital
>>> 700 Children's Drive, W510
>>> Columbus, OH 43205
>>>
>>> Assistant (Jennifer Neelans): (614) 722-2915
>>> Office: (614) 355-2671
>>> Lab: (614) 355-5252
>>> Fax: (614) 722-2818
>>> Web: http://genomics.nchresearch.org/

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}