[BioC] how to normalize by columns

Thu May 26 09:33:39 CEST 2005

On 26 May 2005, at 07:12, diego huck wrote:

>
> Hello
>
>  I am a beginner at bioconductor and R. I have a confussion about how 
> to do a normalization which consist of obtain the mean of a column, 
> and then substract the mean of the column to each value in the column.
>  x1(1)- mean(col x1)	x2(1)- mean(col x2)
>  x1(2)- mean(col x1)	x2(2)- mean(col x2)
>  x1(3)- mean(col x1)	x2(3)- mean(col x2)
> ....................	...................
>
>
>  I have the genes in columns and the conditions in rows.

That is fine, although unusual.  Be aware that many of the BioC (and 
similar) microarray packages use a rows=genes, columns=samples 
convention.  Although this perhaps wouldn't be the way a statistician 
would arrange subjects and measurements in a table in R, I think it is 
partly a historical carry-over from microarray data analysis in 
spreadsheets and the like.  Excel has a 256 column x 65000(ish) row 
size limit, so you are pretty much stuck with one layout!

If you ever need to rotate your data then this is easy:  use the t() 
function.

newArray <- t(oldArray)

>  I don't want to stabilize the variance.

If you did, the vsn package will do this.

>  As you can see is a very simple calculation.
>  I am wondering if could use packages like vsn or affy to do that or 
> is more easy to write a script.

You can do this yourself very easy, as this code snippet shows:

#	Make a spoof array of 100 genes and 20 samples to demonstrate
x <- matrix(runif(2000), ncol=100)

#	Calculate the mean of each column.  Note:  you could us median here 
to make it slightly more robust
colMeans <- apply(x, 2, mean)

#	Subtrate the column means from each value in that column
x <- sweep(x, 2, colMeans, "-")

#	You can do a similar version to subtrate the row means;  simply 
change the second value of both apply() and sweep() to "1".
#	Alternatively, if you wanted to do division as opposed to subtraction 
use
x <- sweep(x, 2, colMeans, "/")

>  Futhermore, I have a doubt if such simple normalization is 
> conceptually correct whith the objetive of eliminate the effect 
> between array.
>  I would to know if I have to iterate any numbers of times the process 
> o f calculate the mean of each column and substract the mean.
>

Subtracting the mean from each column will make the new mean of each 
column zero, so one cycle is enough.

Hope this helps.

David

Prof David Kipling
Department of Pathology
School of Medicine
Cardiff University
Heath Park
Cardiff CF14 4XN

Tel:  029 2074 4847
Email:  KiplingD at cardiff.ac.uk