[BioC] median normalization

James W. MacDonald jmacdon at med.umich.edu
Wed Aug 31 16:45:28 CEST 2011


Hi Viritha,

On 8/31/2011 9:36 AM, viritha kaza wrote:
> Hi James,
> Thanks for your quick reply.
>
> So my last step should be
> exprs<-eset-median+median1.
> Is it?

If your data are not log transformed, yes. This of course assumes that 
this is what the original authors meant by 'median normalization'.

>
> In the paper they say
> "RNA fluorescent labeling reaction and hybridization were performed
> using the Affymetrix Gene Chips HG-U133A and HG-U133B according to the
> manufacturer’s instructions (http://www.affymetrix.com/). The arrays
> consist of 22,283 (HG-U133A) and 22,645 (HG-U133B) probe sets, which
> together amount to 23,583 unique genes based on Unigene build 173.
> Microarray analysis was performed using Affymetrix Microarray Suite 5.0
> and in-house Visual Basic software MATRIX 1.26. Array data were median
> normalized, and replicate genes were combined by averaging. Samples (or
> averages of samples) were then compared against each other by
> calculating log ratios for each gene, and statistical significance was
> presented as a p value calculated by Student’s t test. The microarray
> data have been uploaded to GEO (Gene Expression Omnibus), and the
> accession number is GSE-4824."
>
> In GSE4824 in geo:
> "Data processing:Data were analyzed with Microarray Suite version 5.0
> (MAS 5.0) using Affymetrix default analysis settings and global scaling
> as normalization method. The trimmed mean target intensity of each array
> was arbitrarily set to 250".
> Could you please suggest whether my steps are correct :
> Since there are 3 platforms namely GPL570 which contains 6 arrays which
> are referenece profiles and other 2 i.e HG-U133A and HG-U133B contains
> 79 samples.
> Steps as suggested by paper:
> 1)combine both platforms with mas5 intensity of both the platform for
> the 79samples from series matrix file(unlogged)
> * what about the common probes between HG U133A and HG U133B(168)?
> 2)Add annotation with Unigene and then combine the reference profile
> which is HGU133 plus2
> *Do I ignore those probes which are unique to HGU133plus2 (9921) and
> probes of HGU133A and HGU133B(6)
> 3)Then perform median normalization or median centering.
> 4)Then averaging the replicate genes.
> 5)Log ratios for each genes(fold change)
> 6)then perform statistical student t-test.
> * During which step do I convert the expression to log2 ?

I would assume somewhere before step 5. But this is your project, so you 
have to make that decision for yourself.

Best,

Jim


> wiating for your suggestions.
> Thanks,
> Viritha
> On Mon, Aug 29, 2011 at 1:27 PM, James W. MacDonald
> <jmacdon at med.umich.edu <mailto:jmacdon at med.umich.edu>> wrote:
>
>     Hi Viritha,
>
>
>     On 8/29/2011 12:10 PM, viritha kaza wrote:
>
>         Hi group,
>         I am trying to replicate a dataset GSE4824 from a paper.
>         There are actually 3 platforms in them. But right now I am
>         concentrating
>         only on one platform GPL570.This contains 6 arrays.
>         I have written the code to perform Microarray Suite version 5.0
>         (MAS 5.0)
>         using Affymetrix default analysis settings and global scaling as
>         normalization method. The trimmed mean target intensity of each
>         array was
>         arbitrarily set to 250.After which median normalization.
>
>
>             source("http://bioconductor.__org/biocLite.R
>             <http://bioconductor.org/biocLite.R>")
>
>
>             biocLite("affy")
>
>
>             library(affy)
>
>
>             mydata<- ReadAffy()
>
>
>             eset.mas5 = mas5(mydata,sc=250,normalize=__TRUE)
>
>
>             write.exprs(eset.mas5,"__GSE4824_GPL570.txt",sep='\t')
>
>
>             eset=exprs(eset.mas5)
>
>
>             median = apply(eset, 2, median)
>
>
>             median1=median(median)
>             exprs<-eset/median*median1
>
>
>     The output from mas5() isn't log transformed, so you should be
>     subtracting and adding, not dividing and multiplying.
>
>     This assumes that by 'median normalization' the original authors
>     simply meant median centering.
>
>     Best,
>
>     Jim
>
>
>             write.table(exprs,"GSE4824___GPL570_Median.txt",sep='\t')
>
>
>         Please let me know if the my code performs corectly the above task,
>         especially if last few steps would perform median normalization
>         correctly or
>         not? Also let me know if this is the right way to do median
>         normalization.
>         Thanks,
>         Viritha
>
>         [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>     --
>     James W. MacDonald, M.S.
>     Biostatistician
>     Douglas Lab
>     University of Michigan
>     Department of Human Genetics
>     5912 Buhl
>     1241 E. Catherine St.
>     Ann Arbor MI 48109-5618
>     734-615-7826 <tel:734-615-7826>
>     ******************************__****************************
>     Electronic Mail is not secure, may not be read every day, and should
>     not be used for urgent or sensitive issues
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list