[BioC] Normalization between arrays for common reference, time course and direct two color designs

Jenny Drnevich drnevich at uiuc.edu
Fri Dec 8 19:27:51 CET 2006

```Hi Weiyin,

Actually, "MA\$genes\$spotrep <- NULL" isn't necessary - I thought it I had
to first create a new column in the MA.norm\$genes matrix where I wanted to
put the information about repeated spots, but when I actually tried using
the code, I found out that it is not needed. Which is good, because you
forgot to replace 'MA.norm' with 'MA' in this line. But this isn't why the
code isn't working for you...

I'm not sure what you mean by 'negative duplicated probes' - are you saying
that after you sort your MAList so that spots with the same ID are next to
each other, the first 314 rows all have the same ID? If so, is it the same
as 'x[1]' ? I'm not sure why the code isn't working for you. Try
investigating it:

MA <- MA[order(MA\$genes\$ProbeName),]
x <- unique(MA\$genes\$ProbeName)
class(MA\$genes)  #needs to be a data.frame. If not, do:

MA\$genes <- data.frame(MA\$genes)
MA\$genes[1:5,]  #See what the first 5 rows looks like before adding
replicate numbers

nrow(MA\$genes)  #should be 44202
length(x)               #should be less than half of 44202

y <- which( MA\$genes\$ProbeName == x[ 1 ] )  #do just the first ProbeName

length(y)       #should be 314, if the first 314 rows all have the same name

MA\$genes\$spotrep[y] <- 1:length(y)      #hopefully no errors...

MA\$genes[1:5,]
#check to see that a new column called 'spotrep' is added, and has values
1,2,3,4,5, etc., if they have same ProbeName

If all of the above works, then the 'for (i in 1:length(x)) {' loop will do
the same thing for all of the unique probe names.

cheers,
Jenny

At 03:56 PM 12/7/2006, Weiyin Zhou wrote:
>Hi Jenny,
>
>Thanks a lot for your help.
>
>I used following code:
>
> > MA <- MA[order(MA\$genes\$ProbeName),]
> > x <- unique(MA\$genes\$ProbeName)
> > MA.norm\$genes\$spotrep <- NULL
>
> > for (i in 1:length(x)) {
>      y <- which( MA\$genes\$ProbeName == x[i] )
>      MA\$genes\$spotrep[y] <- 1:length(y)
>      }
>
>Error in `\$<-.data.frame`(`*tmp*`, "spotrep", value = c(1, 2, 3, 4, 5,
>:
>         replacement has 314 rows, data has 44202
>
>"44202" is my total rows.  The "314" is total number of negative
>duplicated probes (all have same names).  They are at the first 314 rows
>after probes being ordered according to their ProbeName
>
>I checked order of MA and contents of x, they are correct.
>
>Could you explain the function of "MA\$genes\$spotrep <- NULL" code here?
>
>
>Thanks a lot,
>
>Weiyin
>
>
>
>-----Original Message-----
>From: Jenny Drnevich [mailto:drnevich at uiuc.edu]
>Sent: Thursday, December 07, 2006 3:52 PM
>To: Weiyin Zhou; Vinoy Kumar Ramachandran
>Cc: bioconductor at stat.math.ethz.ch
>Subject: RE: [BioC] Normalization between arrays for common reference,
>time course and direct two color designs
>
>Hi Weiyin,
>
>Sorry - the object name in the code is arbitrary, so 'MA.norm' is a
>MAList
>object with your data in it. Besides changing \$ID to \$ProbeName as you
>did
>below, you need to change 'MA.norm' to the name of your MAList. I
>probably
>should have specifically said something like: "if your normalized data
>is
>in a MAList object named 'MA.norm', and your spot ID names are found in
>MA.norm\$genes\$ID, then this code should work."
>
>Note that this code does not average duplicate spots. Instead, it
>arranges
>them with spacing =1 so you can use the 'duplicateCorrelation' function
>before lmFit, which is better than averaging the spots. See the
>Within-Array replicate spot section of the limma vignette for an example
>of
>how to do this.
>
>Cheers,
>Jenny
>
>
>
>
>At 01:33 PM 12/7/2006, Weiyin Zhou wrote:
> >Hi Jenny,
> >
> >I have related problem with Agilent two-color array.  All of the spots
> >are duplicated twice (have same "ProbeName", except those positive and
> >negative controls, which are duplicated multiple times.  Column
> >"ControlType" can identify their type.  I use limma package to input
> >data (ProcessedSignal, which is already background corrected and loess
> >normalized), then I did between array quantile normalization.
> >
> >Before I do lmFit and differential expression analysis, I think I
>should
> >remove those control spots and also average duplicated spots.  So I can
> >have p value for each unique ProbeName.  I just tried your code, But
>get
> >error massage.
> >
> > > MA.norm <- MA.norm[order(MA.norm\$genes\$ProbeName),]
> >
> >
> >Could you give me some advice?
> >
> >
> >Weiyin Zhou
> >Statistics and Data Analyst
> >ExonHit Therapeutics, Inc.
> >217 Perry Parkway, Building # 5
> >Gaithersburg, MD 20877
> >
> >email: Weiyin.zhou at exonhit-usa.com
> >phone: 240.404.0184
> >fax: 240.683.7060
> >
> >
> >
> >-----Original Message-----
> >From: bioconductor-bounces at stat.math.ethz.ch
> >[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Jenny
> >Drnevich
> >Sent: Thursday, December 07, 2006 12:17 PM
> >To: Vinoy Kumar Ramachandran
> >Cc: bioconductor at stat.math.ethz.ch
> >Subject: Re: [BioC] Normalization between arrays for common reference,
> >time course and direct two color designs
> >
> >Hi Vinoy,
> >
> >It's better to keep the discussions on the list for other users that
>may
> >
> >have the same question. If they are not evenly spaced, after the
> >normalizations you can rearrange the MA object so that they are evenly
> >spaced, at least the 90% that are spotted twice. The ones that are
> >spotted
> >26 times are likely some sort of control spots, and you can probably
> >safely
> >ignore them. Why are some spotted three times? If you want to keep
>these
> >
> >genes in, a quick-and-dirty solution would be to just pick two of the
> >three
> >spots. The following code *should* work to rearrange the order of the
> >genes, then pick out the first two spots for each unique ID.
> >
> >MA.norm <- MA.norm[order(MA.norm\$genes\$ID),]
> >
> >x <- unique(MA.norm\$genes\$ID)
> >
> >MA.norm\$genes\$spotrep <- NULL
> >
> ># I'm sure there's a better, faster way to do the following, but this
>is
> >
> >the only way I know how:
> >
> >for (i in 1:length(x)) {
> >      y <- which( MA.norm\$genes\$ID == x[i] )
> >      MA.norm\$genes\$spotrep[y] <- 1:length(y)
> >      }
> >
> >MA.norm.2spot <- MA.norm[MA.norm\$genes\$spotrep <= 2 , ]
> ># now your spacing=1 and ndups=2
> >
> >HTH,
> >Jenny
> >
> >
> >
> >
> >At 10:36 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
> > >Hi Jenny,
> > >
> > >Thanks a lot for the valuable information. I will try to do loess
>first
> >
> > >and tehn doa scale if necessary. With regarding the correlation in
>the
> > >LmFit, my the spots in the array are not evenly spaced and not evenly
> > >replicated, 90% spots are spotted twice, 8% are thrice and 2% spots
>are
> >
> > >spotted 26 times.I found this code in a posting in the Limma user
>forum
> >
> > >and try to adapt the code to my data. Is there any other elegant way
>to
> >
> > >deal with this kind of replication ?
> > >
> > >once again thanks for the information
> > >
> > >with regards,
> > >vinoy
> > >On 12/7/06, Jenny Drnevich
> ><<mailto:drnevich at uiuc.edu>drnevich at uiuc.edu>
> > >wrote:
> > >Hi Vinoy,
> > >
> > >Using the 'Gquantile' between-array normalization is not appropriate
>in
> > >your case because your reference is not always in the Green channel.
> >The
> > >values you are using for Exp3 and Exp6 in the linear model are
>actually
> > >from the reference, so it's no wonder your gene lists don't make
>sense.
> >To
> > >clarify, the discussion we were having recently on the mailing list
> > >using Gquantile is when your experimental samples are expected to be
> >VERY
> > >different from the reference, such that the assumption of a
> >within-array
> > >normalization may not be met. In your case (and in most reference
> >designs)
> > >you probably meet the assumptions of most genes not changing, and so
> >should
> > >first do a within-array loess-type normalization to help remove dye
> >bias.
> > >Then check to see if the resulting distributions of M values are
> >similar
> > >between arrays. If they are very different, and you would expect them
> >not
> > >to be very different, do a between-array normalization on the M
>values
> >-
> > >the scale method of 'normalizeBetweenArrays' is my favorite. The
>design
> > >matrix you have below will correctly adjust for dye swaps, assuming
> >that
> > >the 'dye swaps' are all biological replicates and not technical
> >replicates.
> > >
> > >I'm a little confused about the way you're calling the 'lmFit'
> >function.
> > >Your arrays appear to have duplicate spots, but you have the
> >correlation as
> > >zero. Something is very wrong with your arrays if there is zero
> >correlation
> > >between the duplicate spots! I suggested you read the limma vignette
> >very
> > >closely, especially the sections on common reference designs and
> > >within-array replicate spots.
> > >
> > >Good luck,
> > >Jenny
> > >
> > >At 12:58 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
> > > >  Dear Limma users,
> > > >
> > > >I am working on custom spotted 70mer oligo arrays, and use Bluefuse
> >to
> > > >analyse the images. With the help of the excellent user guide and
> > > >Bioconductor user forum(GMANE), i have analysed my direct
>comparison
> > > >experiements. I also have common reference, time course and direct
> >two color
> > > >design type experiments to analyse. I have read the recent posting
>in
> >the
> > > >list  about using Rquantile or Gquantile for normalizing between
> >arrays in
> > > >common reference experiments. I tried to do a common references
> >analysis
> > > >using the discussed code.But the resulting gene list is different
> >from the
> > > >expected list.i am also wondering how to account for dye swaps. I
> >have
> > > >pasted the code which i used for common reference.
> > > >
> > > >It will also be very useful if you any one could tell me how to use
> > > >normalization between arrays for direct two color designs.
> > > >
> > > >My experiment design is
> > > >           Cy3   Cy5
> > > >____________________
> > > >Exp1  Ref    CpdA
> > > >Exp2  Ref    CpdA
> > > >Exp3  CpdA Ref
> > > >
> > > >Exp4  Ref   CpdB
> > > >Exp5  Ref   CpdB
> > > >Exp6 CpdB Ref
> > > >
> > > >Code which i used for analysing common referencec:
> > >
> >
> >-----------------------------------------------------------------------
> >--
> > > ------------------------------------------------
> > > >library(limma)
> > > >targets <- readTargets("commonref.txt", row.names= "Name")
> > > >RG <- read.maimages(targets\$FileName, source="bluefuse")
> > > >RG\$genes <- readGAL()
> > > >RG\$printer <- getLayout(RG\$genes)
> > > >spottypes <- readSpotTypes()
> > > >RG\$genes\$Status <- controlStatus(spottypes, RG)
> > > >isGene <- RG\$genes\$Status == "oligos"
> > > >MA.Gquantile <- normalizeBetweenArrays(RG[isGene,],
> >method="Gquantile")
> > > >RG.Gquantile <- RG.MA(MA.Gquantile)
> > > >MA.dummy <- MA.Gquantile
> > > >MA.dummy\$M <- log2(RG.Gquantile\$R)
> > > >o <- order(MA.dummy\$genes\$ID)
> > > >MA.sorted <- MA.dummy[o,]
> > > >design <- modelMatrix(targets, ref="Ref")
> > > >fit <- lmFit(MA.sorted, design, ndups=2, spacing=1, correlation=0)
> > > >fit.eb <- eBayes(fit)
> > > >write.fit(fit.eb, file="data/commonref.xls", adjust="BH")
> > >
> >
> >-----------------------------------------------------------------------
> >--
> > > --------------------------------------------------------
> > > >
> > > >thanks in advacne
> > > >
> > > >with regards,
> > > >Vinoy......
> > > >
> > > >         [[alternative HTML version deleted]]
> > > >
> > > >_______________________________________________
> > > >Bioconductor mailing list
> > >
> ><mailto:Bioconductor at stat.math.ethz.ch>Bioconductor at stat.math.ethz.ch
> > > >https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > >Search the archives:
> > >
> >
> ><http://news.gmane.org/gmane.science.biology.informatics.conductor>http
> >:/
> > > /news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> > >Jenny Drnevich, Ph.D.
> > >
> > >Functional Genomics Bioinformatics Specialist
> > >W.M. Keck Center for Comparative and Functional Genomics
> > >Roy J. Carver Biotechnology Center
> > >University of Illinois, Urbana-Champaign
> > >
> > >330 ERML
> > >1201 W. Gregory Dr.
> > >Urbana, IL 61801
> > >USA
> > >
> > >ph: 217-244-7355
> > >fax: 217-265-5066
> > >e-mail: <mailto:drnevich at uiuc.edu>drnevich at uiuc.edu
> > >
> > >
> > >
> > >
> > >--
> > >Vinoy......
> >
> >Jenny Drnevich, Ph.D.
> >
> >Functional Genomics Bioinformatics Specialist
> >W.M. Keck Center for Comparative and Functional Genomics
> >Roy J. Carver Biotechnology Center
> >University of Illinois, Urbana-Champaign
> >
> >330 ERML
> >1201 W. Gregory Dr.
> >Urbana, IL 61801
> >USA
> >
> >ph: 217-244-7355
> >fax: 217-265-5066
> >e-mail: drnevich at uiuc.edu
> >         [[alternative HTML version deleted]]
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives:
> >http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>Jenny Drnevich, Ph.D.
>
>Functional Genomics Bioinformatics Specialist
>W.M. Keck Center for Comparative and Functional Genomics
>Roy J. Carver Biotechnology Center
>University of Illinois, Urbana-Champaign
>
>330 ERML
>1201 W. Gregory Dr.
>Urbana, IL 61801
>USA
>
>ph: 217-244-7355
>fax: 217-265-5066
>e-mail: drnevich at uiuc.edu

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu

```