[BioC] distribution of agilent array data.‏

James W. MacDonald jmacdon at uw.edu
Mon Nov 11 01:45:43 CET 2013


Hi Chunxuan,

On Sunday, November 10, 2013 3:19:44 PM, shao chunxuan wrote:
>
>
>
>
>
>
> Hi everyone,
>
> I am confused by the histogram of normalized Agilent microarray data.
> It is human single color array, containing around 700 microarrays and 43K probes.
>
> After normalization, I plotted the express value of all probes in single microarray, one example is attached.
>
> I
>   expected to see a more or less symmetric distribution, however, the
> values seems truncated. In the beginning I thought it may relate to
> offset value, but I have tried different value 16, 1, 0, still got
> similar distribution.

Why would you expect a symmetric distribution? Also, plotting a 
histogram with such large bin sizes isn't very helpful - I wouldn't be 
willing to say much about the distribution based on that plot anyway.

A more reasonable expectation is something like a convolution of a 
lognormal and an exponential distribution. In other words, there are 
likely a large number of genes that aren't expressed, and the 
distribution of those probes will be symmetrical around some small 
number. And the distribution of expressed genes is likely to be 
something like an exponential, with a long right tail. And since you 
used the normexp background correction, you made the same assumption as 
well.

Best,

Jim


>
> Any explanation or suggestions?
>
> Here are codes for normalization:
> library(limma)
> targets <- readTargets("targets.txt")
> x <- read.maimages(targets, source="agilent",green.only=TRUE)
> y.bg <- backgroundCorrect(x, method="normexp")
> y.bgn <- normalizeBetweenArrays(y.bg, method="quantile")
> g.ex <- avereps(y.bgn, ID=y.bgn$genes$ProbeName)
> da.norm <- g.ex$E
>
> Here are R session:
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] graphics  grDevices utils     datasets  stats     methods   base
>
> other attached packages:
> [1] ggplot2_0.9.3.1 reshape2_1.2.2  plyr_1.8
>
> loaded via a namespace (and not attached):
>   [1] colorspace_1.2-4   dichromat_2.0-0    digest_0.6.3       grid_3.0.2         gtable_0.1.2       labeling_0.2
>   [7] MASS_7.3-29        munsell_0.4.2      proto_0.3-10       RColorBrewer_1.0-5 scales_0.2.3       stringr_0.6.2
> Best,
>
> chunxuan
>
>   		 	   		
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list