[BioC] Normalization between arrays for common reference, time course and direct two color designs

Weiyin Zhou weiyin.zhou at exonhit-usa.com
Thu Dec 7 20:33:26 CET 2006

Hi Jenny,

I have related problem with Agilent two-color array.  All of the spots
are duplicated twice (have same "ProbeName", except those positive and
negative controls, which are duplicated multiple times.  Column
"ControlType" can identify their type.  I use limma package to input
data (ProcessedSignal, which is already background corrected and loess
normalized), then I did between array quantile normalization.

Before I do lmFit and differential expression analysis, I think I should
remove those control spots and also average duplicated spots.  So I can
have p value for each unique ProbeName.  I just tried your code, But get
error massage.

> MA.norm <- MA.norm[order(MA.norm$genes$ProbeName),]
Error: object "MA.norm" not found

Could you give me some advice?

Thanks in advance,

Weiyin Zhou
Statistics and Data Analyst
ExonHit Therapeutics, Inc.
217 Perry Parkway, Building # 5
Gaithersburg, MD 20877

email: Weiyin.zhou at exonhit-usa.com
phone: 240.404.0184
fax: 240.683.7060

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Jenny
Sent: Thursday, December 07, 2006 12:17 PM
To: Vinoy Kumar Ramachandran
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Normalization between arrays for common reference,
time course and direct two color designs

Hi Vinoy,

It's better to keep the discussions on the list for other users that may

have the same question. If they are not evenly spaced, after the 
normalizations you can rearrange the MA object so that they are evenly 
spaced, at least the 90% that are spotted twice. The ones that are
26 times are likely some sort of control spots, and you can probably
ignore them. Why are some spotted three times? If you want to keep these

genes in, a quick-and-dirty solution would be to just pick two of the
spots. The following code *should* work to rearrange the order of the 
genes, then pick out the first two spots for each unique ID.

MA.norm <- MA.norm[order(MA.norm$genes$ID),]

x <- unique(MA.norm$genes$ID)

MA.norm$genes$spotrep <- NULL

# I'm sure there's a better, faster way to do the following, but this is

the only way I know how:

for (i in 1:length(x)) {
     y <- which( MA.norm$genes$ID == x[i] )
     MA.norm$genes$spotrep[y] <- 1:length(y)

MA.norm.2spot <- MA.norm[MA.norm$genes$spotrep <= 2 , ]
# now your spacing=1 and ndups=2


At 10:36 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
>Hi Jenny,
>Thanks a lot for the valuable information. I will try to do loess first

>and tehn doa scale if necessary. With regarding the correlation in the 
>LmFit, my the spots in the array are not evenly spaced and not evenly 
>replicated, 90% spots are spotted twice, 8% are thrice and 2% spots are

>spotted 26 times.I found this code in a posting in the Limma user forum

>and try to adapt the code to my data. Is there any other elegant way to

>deal with this kind of replication ?
>once again thanks for the information
>with regards,
>On 12/7/06, Jenny Drnevich
<<mailto:drnevich at uiuc.edu>drnevich at uiuc.edu> 
>Hi Vinoy,
>Using the 'Gquantile' between-array normalization is not appropriate in
>your case because your reference is not always in the Green channel.
>values you are using for Exp3 and Exp6 in the linear model are actually
>from the reference, so it's no wonder your gene lists don't make sense.
>clarify, the discussion we were having recently on the mailing list
>using Gquantile is when your experimental samples are expected to be
>different from the reference, such that the assumption of a
>normalization may not be met. In your case (and in most reference
>you probably meet the assumptions of most genes not changing, and so
>first do a within-array loess-type normalization to help remove dye
>Then check to see if the resulting distributions of M values are
>between arrays. If they are very different, and you would expect them
>to be very different, do a between-array normalization on the M values
>the scale method of 'normalizeBetweenArrays' is my favorite. The design
>matrix you have below will correctly adjust for dye swaps, assuming
>the 'dye swaps' are all biological replicates and not technical
>I'm a little confused about the way you're calling the 'lmFit'
>Your arrays appear to have duplicate spots, but you have the
correlation as
>zero. Something is very wrong with your arrays if there is zero
>between the duplicate spots! I suggested you read the limma vignette
>closely, especially the sections on common reference designs and
>within-array replicate spots.
>Good luck,
>At 12:58 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
> >  Dear Limma users,
> >
> >I am working on custom spotted 70mer oligo arrays, and use Bluefuse
> >analyse the images. With the help of the excellent user guide and
> >Bioconductor user forum(GMANE), i have analysed my direct comparison
> >experiements. I also have common reference, time course and direct
two color
> >design type experiments to analyse. I have read the recent posting in
> >list  about using Rquantile or Gquantile for normalizing between
arrays in
> >common reference experiments. I tried to do a common references
> >using the discussed code.But the resulting gene list is different
from the
> >expected list.i am also wondering how to account for dye swaps. I
> >pasted the code which i used for common reference.
> >
> >It will also be very useful if you any one could tell me how to use
> >normalization between arrays for direct two color designs.
> >
> >My experiment design is
> >           Cy3   Cy5
> >____________________
> >Exp1  Ref    CpdA
> >Exp2  Ref    CpdA
> >Exp3  CpdA Ref
> >
> >Exp4  Ref   CpdB
> >Exp5  Ref   CpdB
> >Exp6 CpdB Ref
> >
> >Code which i used for analysing common referencec:
> ------------------------------------------------
> >library(limma)
> >targets <- readTargets("commonref.txt", row.names= "Name")
> >RG <- read.maimages(targets$FileName, source="bluefuse")
> >RG$genes <- readGAL()
> >RG$printer <- getLayout(RG$genes)
> >spottypes <- readSpotTypes()
> >RG$genes$Status <- controlStatus(spottypes, RG)
> >isGene <- RG$genes$Status == "oligos"
> >MA.Gquantile <- normalizeBetweenArrays(RG[isGene,],
> >RG.Gquantile <- RG.MA(MA.Gquantile)
> >MA.dummy <- MA.Gquantile
> >MA.dummy$M <- log2(RG.Gquantile$R)
> >o <- order(MA.dummy$genes$ID)
> >MA.sorted <- MA.dummy[o,]
> >design <- modelMatrix(targets, ref="Ref")
> >fit <- lmFit(MA.sorted, design, ndups=2, spacing=1, correlation=0)
> >fit.eb <- eBayes(fit)
> >write.fit(fit.eb, file="data/commonref.xls", adjust="BH")
> --------------------------------------------------------
> >
> >thanks in advacne
> >
> >with regards,
> >Vinoy......
> >
> >         [[alternative HTML version deleted]]
> >
> >_______________________________________________
> >Bioconductor mailing list
> ><mailto:Bioconductor at stat.math.ethz.ch>Bioconductor at stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives:
> /news.gmane.org/gmane.science.biology.informatics.conductor
>Jenny Drnevich, Ph.D.
>Functional Genomics Bioinformatics Specialist
>W.M. Keck Center for Comparative and Functional Genomics
>Roy J. Carver Biotechnology Center
>University of Illinois, Urbana-Champaign
>330 ERML
>1201 W. Gregory Dr.
>Urbana, IL 61801
>ph: 217-244-7355
>fax: 217-265-5066
>e-mail: <mailto:drnevich at uiuc.edu>drnevich at uiuc.edu

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu 
	[[alternative HTML version deleted]]

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
Search the archives:

More information about the Bioconductor mailing list