[BioC] Opinions on array design, normalization, and linear modeling with LIMMA

Jianping Jin jjin at email.unc.edu
Thu Nov 1 14:47:46 CET 2007


Yong,

What is your reference sample(s) for this test run? Looks like the 
experiment and reference samples are quite different.

JJ-

--On Wednesday, October 31, 2007 4:40 PM -0500 Yong Yin 
<yyin at watson.wustl.edu> wrote:

> Dear list,
>
> I am new to BioConductor, so please forgive me if my questions are
> naive to you.
>
> We designed an Agilent 4x44k array, with the same 44K probes printed
> 4 times in the 4 blocks. These 44K probes are designed based on a low-
> coverage genome sequencing project for a parasitic nematode. Our
> purpose is to investigate gene expression during early embryogenesis
> of the nematode.
>
> We have received results from a test run to evaluate the array
> quality. Samples applied on the chip were from two time points during
> the nematode embryogenesis. As a experiment, I have been following
> the LIMMA manual step-by-step, treating the results as a simple two-
> sample comparison with both technical and biological replication. I
> have uploaded 3 images in the following location and would love to
> hear what you folks think:
>
> ftp://genome.wustl.edu/private/272205387781472/yong_data.071031/
>
> The general quality of the array is very good, I can't find any
> indication of quality problem. The file "MA_RGLW1.pdf" is a MA plot
> of raw RG data for one of the 4 blocks. After background correction
> with "normexp" and within-array normalization with global loess, its
> MA plot is shown as in "MA_MALWC1.pdf".
>
> Given that we are studying early embryogenesis, we should expect that
> a lot of genes are differentially expressed at these two time points.
> In the MA plots, I think we indeed see lots of DE.  However,
> according to what I read, the underline assumption for such
> normalization is that the majority of the genes under investigation
> should not be differentially expressed. I also read from other
> people's posts that I should keep the normalization as simple as
> possible and the "good" data will always be good.
>
>  From my MA plots, do you think my normalization is reasonable with
> this data? If not, are there suggestions what to do? a different
> normalization method? or even change the design of the array with a
> set of spike-in control probes to use for normalization?
>
> The two time points in this test run are actually the beginning and
> the ending points of the developmental stages that we are planning to
> investigate. We are considering to use a pooled-sample as a common
> reference. We hope a pooled reference like this will decrease the
> degrees of differential expression between any two samples of our
> study. Does this sound like a good idea?
>
> After normalization with loess, I went ahead to the step of linear
> modeling with eBayes and got the following QQ plot:
> "QQPlot_fitLWC2eBayes.pdf'.
>
> Does the modeling look reasonable, according to your experience?
>
> Any opinions and advices are greatly appreciated.
>
> Best,
>
> Yong Yin, Ph.D.
>
> Senior Scientist
> Genome Sequencing Center
> Washington University School of Medicine, Campus box 8501
> 4444 Forest Park
> Saint Louis, MO 63108
>
> Tel: (314) 286-1415
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



##################################
Jianping Jin Ph.D.
Bioinformatics scientist
Center for Bioinformatics
Room 3133 Bioinformatics building
CB# 7104
University of Chapel Hill
Chapel Hill, NC 27599
Phone: (919)843-6105
FAX:   (919)843-3103
E-Mail: jjin at email.unc.edu



More information about the Bioconductor mailing list