[BioC] Normalization between arrays for common reference, time course and direct two color designs

Thu Dec 7 22:56:28 CET 2006

Hi Jenny,

Thanks a lot for your help.

I used following code: 

> MA <- MA[order(MA$genes$ProbeName),]
> x <- unique(MA$genes$ProbeName)
> MA.norm$genes$spotrep <- NULL

> for (i in 1:length(x)) {
     y <- which( MA$genes$ProbeName == x[i] )
     MA$genes$spotrep[y] <- 1:length(y)
     }

Error in `$<-.data.frame`(`*tmp*`, "spotrep", value = c(1, 2, 3, 4, 5,
: 
        replacement has 314 rows, data has 44202

"44202" is my total rows.  The "314" is total number of negative
duplicated probes (all have same names).  They are at the first 314 rows
after probes being ordered according to their ProbeName

I checked order of MA and contents of x, they are correct.  

Could you explain the function of "MA$genes$spotrep <- NULL" code here?

Thanks a lot,

Weiyin

-----Original Message-----
From: Jenny Drnevich [mailto:drnevich at uiuc.edu] 
Sent: Thursday, December 07, 2006 3:52 PM
To: Weiyin Zhou; Vinoy Kumar Ramachandran
Cc: bioconductor at stat.math.ethz.ch
Subject: RE: [BioC] Normalization between arrays for common reference,
time course and direct two color designs

Hi Weiyin,

Sorry - the object name in the code is arbitrary, so 'MA.norm' is a
MAList 
object with your data in it. Besides changing $ID to $ProbeName as you
did 
below, you need to change 'MA.norm' to the name of your MAList. I
probably 
should have specifically said something like: "if your normalized data
is 
in a MAList object named 'MA.norm', and your spot ID names are found in 
MA.norm$genes$ID, then this code should work."

Note that this code does not average duplicate spots. Instead, it
arranges 
them with spacing =1 so you can use the 'duplicateCorrelation' function 
before lmFit, which is better than averaging the spots. See the 
Within-Array replicate spot section of the limma vignette for an example
of 
how to do this.

Cheers,
Jenny

At 01:33 PM 12/7/2006, Weiyin Zhou wrote:
>Hi Jenny,
>
>I have related problem with Agilent two-color array.  All of the spots
>are duplicated twice (have same "ProbeName", except those positive and
>negative controls, which are duplicated multiple times.  Column
>"ControlType" can identify their type.  I use limma package to input
>data (ProcessedSignal, which is already background corrected and loess
>normalized), then I did between array quantile normalization.
>
>Before I do lmFit and differential expression analysis, I think I
should
>remove those control spots and also average duplicated spots.  So I can
>have p value for each unique ProbeName.  I just tried your code, But
get
>error massage.
>
> > MA.norm <- MA.norm[order(MA.norm$genes$ProbeName),]
>Error: object "MA.norm" not found
>
>
>Could you give me some advice?
>
>Thanks in advance,
>
>Weiyin Zhou
>Statistics and Data Analyst
>ExonHit Therapeutics, Inc.
>217 Perry Parkway, Building # 5
>Gaithersburg, MD 20877
>
>email: Weiyin.zhou at exonhit-usa.com
>phone: 240.404.0184
>fax: 240.683.7060
>
>
>
>-----Original Message-----
>From: bioconductor-bounces at stat.math.ethz.ch
>[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Jenny
>Drnevich
>Sent: Thursday, December 07, 2006 12:17 PM
>To: Vinoy Kumar Ramachandran
>Cc: bioconductor at stat.math.ethz.ch
>Subject: Re: [BioC] Normalization between arrays for common reference,
>time course and direct two color designs
>
>Hi Vinoy,
>
>It's better to keep the discussions on the list for other users that
may
>
>have the same question. If they are not evenly spaced, after the
>normalizations you can rearrange the MA object so that they are evenly
>spaced, at least the 90% that are spotted twice. The ones that are
>spotted
>26 times are likely some sort of control spots, and you can probably
>safely
>ignore them. Why are some spotted three times? If you want to keep
these
>
>genes in, a quick-and-dirty solution would be to just pick two of the
>three
>spots. The following code *should* work to rearrange the order of the
>genes, then pick out the first two spots for each unique ID.
>
>MA.norm <- MA.norm[order(MA.norm$genes$ID),]
>
>x <- unique(MA.norm$genes$ID)
>
>MA.norm$genes$spotrep <- NULL
>
># I'm sure there's a better, faster way to do the following, but this
is
>
>the only way I know how:
>
>for (i in 1:length(x)) {
>      y <- which( MA.norm$genes$ID == x[i] )
>      MA.norm$genes$spotrep[y] <- 1:length(y)
>      }
>
>MA.norm.2spot <- MA.norm[MA.norm$genes$spotrep <= 2 , ]
># now your spacing=1 and ndups=2
>
>HTH,
>Jenny
>
>
>
>
>At 10:36 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
> >Hi Jenny,
> >
> >Thanks a lot for the valuable information. I will try to do loess
first
>
> >and tehn doa scale if necessary. With regarding the correlation in
the
> >LmFit, my the spots in the array are not evenly spaced and not evenly
> >replicated, 90% spots are spotted twice, 8% are thrice and 2% spots
are
>
> >spotted 26 times.I found this code in a posting in the Limma user
forum
>
> >and try to adapt the code to my data. Is there any other elegant way
to
>
> >deal with this kind of replication ?
> >
> >once again thanks for the information
> >
> >with regards,
> >vinoy
> >On 12/7/06, Jenny Drnevich
><<mailto:drnevich at uiuc.edu>drnevich at uiuc.edu>
> >wrote:
> >Hi Vinoy,
> >
> >Using the 'Gquantile' between-array normalization is not appropriate
in
> >your case because your reference is not always in the Green channel.
>The
> >values you are using for Exp3 and Exp6 in the linear model are
actually
> >from the reference, so it's no wonder your gene lists don't make
sense.
>To
> >clarify, the discussion we were having recently on the mailing list
>about
> >using Gquantile is when your experimental samples are expected to be
>VERY
> >different from the reference, such that the assumption of a
>within-array
> >normalization may not be met. In your case (and in most reference
>designs)
> >you probably meet the assumptions of most genes not changing, and so
>should
> >first do a within-array loess-type normalization to help remove dye
>bias.
> >Then check to see if the resulting distributions of M values are
>similar
> >between arrays. If they are very different, and you would expect them
>not
> >to be very different, do a between-array normalization on the M
values
>-
> >the scale method of 'normalizeBetweenArrays' is my favorite. The
design
> >matrix you have below will correctly adjust for dye swaps, assuming
>that
> >the 'dye swaps' are all biological replicates and not technical
>replicates.
> >
> >I'm a little confused about the way you're calling the 'lmFit'
>function.
> >Your arrays appear to have duplicate spots, but you have the
>correlation as
> >zero. Something is very wrong with your arrays if there is zero
>correlation
> >between the duplicate spots! I suggested you read the limma vignette
>very
> >closely, especially the sections on common reference designs and
> >within-array replicate spots.
> >
> >Good luck,
> >Jenny
> >
> >At 12:58 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
> > >  Dear Limma users,
> > >
> > >I am working on custom spotted 70mer oligo arrays, and use Bluefuse
>to
> > >analyse the images. With the help of the excellent user guide and
> > >Bioconductor user forum(GMANE), i have analysed my direct
comparison
> > >experiements. I also have common reference, time course and direct
>two color
> > >design type experiments to analyse. I have read the recent posting
in
>the
> > >list  about using Rquantile or Gquantile for normalizing between
>arrays in
> > >common reference experiments. I tried to do a common references
>analysis
> > >using the discussed code.But the resulting gene list is different
>from the
> > >expected list.i am also wondering how to account for dye swaps. I
>have
> > >pasted the code which i used for common reference.
> > >
> > >It will also be very useful if you any one could tell me how to use
> > >normalization between arrays for direct two color designs.
> > >
> > >My experiment design is
> > >           Cy3   Cy5
> > >____________________
> > >Exp1  Ref    CpdA
> > >Exp2  Ref    CpdA
> > >Exp3  CpdA Ref
> > >
> > >Exp4  Ref   CpdB
> > >Exp5  Ref   CpdB
> > >Exp6 CpdB Ref
> > >
> > >Code which i used for analysing common referencec:
> >
>
>-----------------------------------------------------------------------
>--
> > ------------------------------------------------
> > >library(limma)
> > >targets <- readTargets("commonref.txt", row.names= "Name")
> > >RG <- read.maimages(targets$FileName, source="bluefuse")
> > >RG$genes <- readGAL()
> > >RG$printer <- getLayout(RG$genes)
> > >spottypes <- readSpotTypes()
> > >RG$genes$Status <- controlStatus(spottypes, RG)
> > >isGene <- RG$genes$Status == "oligos"
> > >MA.Gquantile <- normalizeBetweenArrays(RG[isGene,],
>method="Gquantile")
> > >RG.Gquantile <- RG.MA(MA.Gquantile)
> > >MA.dummy <- MA.Gquantile
> > >MA.dummy$M <- log2(RG.Gquantile$R)
> > >o <- order(MA.dummy$genes$ID)
> > >MA.sorted <- MA.dummy[o,]
> > >design <- modelMatrix(targets, ref="Ref")
> > >fit <- lmFit(MA.sorted, design, ndups=2, spacing=1, correlation=0)
> > >fit.eb <- eBayes(fit)
> > >write.fit(fit.eb, file="data/commonref.xls", adjust="BH")
> >
>
>-----------------------------------------------------------------------
>--
> > --------------------------------------------------------
> > >
> > >thanks in advacne
> > >
> > >with regards,
> > >Vinoy......
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > >_______________________________________________
> > >Bioconductor mailing list
> >
><mailto:Bioconductor at stat.math.ethz.ch>Bioconductor at stat.math.ethz.ch
> > >https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >Search the archives:
> >
>
><http://news.gmane.org/gmane.science.biology.informatics.conductor>http
>:/
> > /news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >Jenny Drnevich, Ph.D.
> >
> >Functional Genomics Bioinformatics Specialist
> >W.M. Keck Center for Comparative and Functional Genomics
> >Roy J. Carver Biotechnology Center
> >University of Illinois, Urbana-Champaign
> >
> >330 ERML
> >1201 W. Gregory Dr.
> >Urbana, IL 61801
> >USA
> >
> >ph: 217-244-7355
> >fax: 217-265-5066
> >e-mail: <mailto:drnevich at uiuc.edu>drnevich at uiuc.edu
> >
> >
> >
> >
> >--
> >Vinoy......
>
>Jenny Drnevich, Ph.D.
>
>Functional Genomics Bioinformatics Specialist
>W.M. Keck Center for Comparative and Functional Genomics
>Roy J. Carver Biotechnology Center
>University of Illinois, Urbana-Champaign
>
>330 ERML
>1201 W. Gregory Dr.
>Urbana, IL 61801
>USA
>
>ph: 217-244-7355
>fax: 217-265-5066
>e-mail: drnevich at uiuc.edu
>         [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu