[BioC] Normalizing single-channel data [was: is my normalization right?]

Gordon Smyth smyth at wehi.edu.au
Wed Jun 2 01:44:25 CEST 2004

Dear Xiaopeng,

You are raising the issue of normalizing single channel (non-Affy) 
microarray data. This is not yet documented but is not difficult using 
between-array normalization methods provided in limma or vsn.

Firstly, let me point out that your text file doesn't contain the "raw 
data" from Genepix since it doesn't contain background intensities. Have 
you already subtracted the background or have you just ignored it? What did 
you do with the Genepix flags?

1. Given a text file like you describe, you can read into R using the basic 
function read.table()

Data <- read.table("myfile.txt",sep="\t")  # I assume your file is 
y <- as.matrix(Data[,-1])
rownames(y) <- as.character(Data[,1])

Now you have two major normalization choices, quantile or vsn normalization.

y2 <- normalizeBetweenArrays(log2(y), method="quantile")


y2 <- normalizeBetweenArrays(y, method="vsn")

Now you are ready to go straight into analysis differential expression 
using limma like

fit <- lmFit(y2, design)

If you use quantile normalization, you must make sure that all your 
intensities are positive before normalizing, for example by

y <- pmax(1, y)

2. You never did need to extract the intensity data from the Genepix gpr 
files in the first place. You could have proceeded in limma as

targets <- readTargets()  # Always good practice to make a targets file
RG <- read.maimages(targets$FileName, source="genepix", 
columns=list(Rf="F532 Mean",Gf="F532 Mean",Rb="B532 Median",Gb="B532 Median"))
y2 <- normalizeBetweenArrays(RG$G, method="quantile")

Or you might choose to apply backgroundCorrect() before 


>xpzhang xpzhang at genetics.ac.cn
>Sat May 29 09:21:55 CEST 2004
>Thank you for your answer!
>My raw-data was from GenePix. Because I used only Cy3 in my whole
>microarray experiment, I only extract data by the software,and try to
>normalize the data by Bioconductor.
>I made a .txt file for the raw data, it was just like this:
>Gene Name  Contrl(intensity)   Treat1(intensity)   Treat2(intensity) 
>I want to use mutiple slides normalization with intensity dependent, is
>it appropriate? And could you tell me howto? I am trying to find out
>ways by reading Bioconductor's document and help files,but I feel really
>Thank you very much!

More information about the Bioconductor mailing list