[BioC] Opinions on array design, normalization, and linear modeling with LIMMA

Kasper Daniel Hansen khansen at stat.Berkeley.EDU
Thu Nov 1 18:36:18 CET 2007


I agree here, the scale on the y-axis is quite dramatic. Note that we  
are not necessarily saying that too many genes are DE, but that some  
of them have dramatic fold changes.

Most of the normalization techniques are derived under the assumption  
that not too many genes are DE. Facing your problem of many DE genes,  
some people would say "clearly the assumptions are not correct". I  
would say that you should use the methods that gives you the best  
inference. Sometimes people have observed that applying the  
"standard" normalization techniques actually improve their calls,  
even on datasets with many DE genes.

You will probably need some control spots on the array to really  
quantify this.

I think most of us need more time with the data in order to really  
give you any recommendations. You should seek out a local expert.

Kasper


On Nov 1, 2007, at 9:46 AM, Jianping Jin wrote:

> Hi Yong,
>
> I have never seen a MA plot with such wide spread spots. It may be  
> caused
> by its real biology or technique artifacts. My suggestion is to do  
> more
> data quality assessment, such as "plotDensities". Dye swap labeling or
> using a common reference RNA may help to confirm the difference or
> problems.
>
> JJ-
>
> --On Thursday, November 01, 2007 11:13 AM -0500 Yong Yin
> <yyin at watson.wustl.edu> wrote:
>
>> Dear list,
>>
>>
>> I think I need to simplify my question.
>>
>>
>> I have two samples, each from a time point of its embryogenesis.  
>> They are
>> applied on a two-color Agilent array to compare between each other.
>>
>>
>> The raw data has a MA-plot like this:
>>
>>
>>
>> ftp://genome.wustl.edu/private/272205387781472/yong_data.071031/ 
>> MA_RGLW1.
>> pdf
>>
>>
>> After "normexp" and global loess, the MA-plot does change it's  
>> shape as
>> seen here:
>>
>>
>>
>> ftp://genome.wustl.edu/private/272205387781472/yong_data.071031/ 
>> MA_MALWC1
>> .pdf
>>
>>
>> My 1st question:
>>
>>
>> Does my data have too much differential expression, according to your
>> experience?
>>
>>
>> Apparently, Jianping thinks so.
>>
>>
>> Then my 2nd question:
>>
>>
>> Is it still ok to use global loess for normalization?
>>
>>
>> Thanks so much, I need your opinions.
>>
>>
>> I am running the latest R and all packages. Commands I used are:
>>
>>
>>
>>> RGLWC <- backgroundCorrect(RGLW, method="normexp", offset=50)
>>
>>> MALWC <- normalizeWithinArrays(RGLWC, method="loess")
>>
>>
>>
>>
>> Best,
>>
>>
>> Yong
>>
>>
>>
>>
>> On Nov 1, 2007, at 8:47 AM, Jianping Jin wrote:
>>
>>
>> Yong,
>>
>>
>> What is your reference sample(s) for this test run? Looks like the
>> experiment and reference samples are quite different.
>>
>>
>> JJ-
>>
>>
>> --On Wednesday, October 31, 2007 4:40 PM -0500 Yong Yin
>> <yyin at watson.wustl.edu> wrote:
>>
>>
>>
>>
>> Dear list,
>>
>>
>> I am new to BioConductor, so please forgive me if my questions are
>> naive to you.
>>
>>
>> We designed an Agilent 4x44k array, with the same 44K probes printed
>> 4 times in the 4 blocks. These 44K probes are designed based on a  
>> low-
>> coverage genome sequencing project for a parasitic nematode. Our
>> purpose is to investigate gene expression during early embryogenesis
>> of the nematode.
>>
>>
>> We have received results from a test run to evaluate the array
>> quality. Samples applied on the chip were from two time points during
>> the nematode embryogenesis. As a experiment, I have been following
>> the LIMMA manual step-by-step, treating the results as a simple two-
>> sample comparison with both technical and biological replication. I
>> have uploaded 3 images in the following location and would love to
>> hear what you folks think:
>>
>>
>> ftp://genome.wustl.edu/private/272205387781472/yong_data.071031/
>>
>>
>> The general quality of the array is very good, I can't find any
>> indication of quality problem. The file "MA_RGLW1.pdf" is a MA plot
>> of raw RG data for one of the 4 blocks. After background correction
>> with "normexp" and within-array normalization with global loess, its
>> MA plot is shown as in "MA_MALWC1.pdf".
>>
>>
>> Given that we are studying early embryogenesis, we should expect that
>> a lot of genes are differentially expressed at these two time points.
>> In the MA plots, I think we indeed see lots of DE.  However,
>> according to what I read, the underline assumption for such
>> normalization is that the majority of the genes under investigation
>> should not be differentially expressed. I also read from other
>> people's posts that I should keep the normalization as simple as
>> possible and the "good" data will always be good.
>>
>>
>>  From my MA plots, do you think my normalization is reasonable with
>> this data? If not, are there suggestions what to do? a different
>> normalization method? or even change the design of the array with a
>> set of spike-in control probes to use for normalization?
>>
>>
>> The two time points in this test run are actually the beginning and
>> the ending points of the developmental stages that we are planning to
>> investigate. We are considering to use a pooled-sample as a common
>> reference. We hope a pooled reference like this will decrease the
>> degrees of differential expression between any two samples of our
>> study. Does this sound like a good idea?
>>
>>
>> After normalization with loess, I went ahead to the step of linear
>> modeling with eBayes and got the following QQ plot:
>> "QQPlot_fitLWC2eBayes.pdf'.
>>
>>
>> Does the modeling look reasonable, according to your experience?
>>
>>
>> Any opinions and advices are greatly appreciated.
>>
>>
>> Best,
>>
>>
>> Yong Yin, Ph.D.
>>
>>
>> Senior Scientist
>> Genome Sequencing Center
>> Washington University School of Medicine, Campus box 8501
>> 4444 Forest Park
>> Saint Louis, MO 63108
>>
>>
>> Tel: (314) 286-1415
>>
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>
>>
>>
>>
>>
>> ##################################
>> Jianping Jin Ph.D.
>> Bioinformatics scientist
>> Center for Bioinformatics
>> Room 3133 Bioinformatics building
>> CB# 7104
>> University of Chapel Hill
>> Chapel Hill, NC 27599
>> Phone: (919)843-6105
>> FAX:   (919)843-3103
>> E-Mail: jjin at email.unc.edu
>>
>>
>>
>>
>>
>>
>> Yong Yin, Ph.D.
>>
>>
>> Senior Scientist
>> Genome Sequencing Center
>> Washington University School of Medicine, Campus box 8501
>> 4444 Forest Park
>> Saint Louis, MO 63108
>>
>>
>> Tel: (314) 286-1415
>>
>
>
>
> ##################################
> Jianping Jin Ph.D.
> Bioinformatics scientist
> Center for Bioinformatics
> Room 3133 Bioinformatics building
> CB# 7104
> University of Chapel Hill
> Chapel Hill, NC 27599
> Phone: (919)843-6105
> FAX:   (919)843-3103
> E-Mail: jjin at email.unc.edu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list