[BioC] how to normalize by columns

diego huck diegolugro at yahoo.com.ar
Fri May 27 19:06:41 CEST 2005


  Thank you David, this commands were very useful.
  Thank you Gordon for your comments, I´ll go to see the again the 
statistics theory.

  best regards

  diego





David Kipling wrote:

> 
> 
> On 26 May 2005, at 07:12, diego huck wrote:
> 
>>
>> Hello
>>
>>  I am a beginner at bioconductor and R. I have a confussion about how 
>> to do a normalization which consist of obtain the mean of a column, 
>> and then substract the mean of the column to each value in the column.
>>  x1(1)- mean(col x1)    x2(1)- mean(col x2)
>>  x1(2)- mean(col x1)    x2(2)- mean(col x2)
>>  x1(3)- mean(col x1)    x2(3)- mean(col x2)
>> ....................    ...................
>>
>>
>>  I have the genes in columns and the conditions in rows.
> 
> 
> That is fine, although unusual.  Be aware that many of the BioC (and 
> similar) microarray packages use a rows=genes, columns=samples 
> convention.  Although this perhaps wouldn't be the way a statistician 
> would arrange subjects and measurements in a table in R, I think it is 
> partly a historical carry-over from microarray data analysis in 
> spreadsheets and the like.  Excel has a 256 column x 65000(ish) row size 
> limit, so you are pretty much stuck with one layout!
> 
> If you ever need to rotate your data then this is easy:  use the t() 
> function.
> 
> newArray <- t(oldArray)
> 
> 
>>  I don't want to stabilize the variance.
> 
> 
> If you did, the vsn package will do this.
> 
>>  As you can see is a very simple calculation.
>>  I am wondering if could use packages like vsn or affy to do that or 
>> is more easy to write a script.
> 
> 
> You can do this yourself very easy, as this code snippet shows:
> 
> 
> #    Make a spoof array of 100 genes and 20 samples to demonstrate
> x <- matrix(runif(2000), ncol=100)
> 
> #    Calculate the mean of each column.  Note:  you could us median here 
> to make it slightly more robust
> colMeans <- apply(x, 2, mean)
> 
> #    Subtrate the column means from each value in that column
> x <- sweep(x, 2, colMeans, "-")
> 
> #    You can do a similar version to subtrate the row means;  simply 
> change the second value of both apply() and sweep() to "1".
> #    Alternatively, if you wanted to do division as opposed to 
> subtraction use
> x <- sweep(x, 2, colMeans, "/")
> 
> 
>>  Futhermore, I have a doubt if such simple normalization is 
>> conceptually correct whith the objetive of eliminate the effect 
>> between array.
>>  I would to know if I have to iterate any numbers of times the process 
>> o f calculate the mean of each column and substract the mean.
>>
> 
> Subtracting the mean from each column will make the new mean of each 
> column zero, so one cycle is enough.
> 
> Hope this helps.
> 
> David
> 
> Prof David Kipling
> Department of Pathology
> School of Medicine
> Cardiff University
> Heath Park
> Cardiff CF14 4XN
> 
> Tel:  029 2074 4847
> Email:  KiplingD at cardiff.ac.uk
> 
>



More information about the Bioconductor mailing list