[BioC] FW: duplicates, technical and biological replicates + dividing a microarray into two parts

Wed Feb 3 13:06:56 CET 2010

Dear Natali Altman, 

Thank you very much for your answer. 

>From your answer, now I have more questions. I hope you could help me answer them. 
I apologize in advance, if the questions are basic, but I am new in the statistic field and I never saw a microarray in my life. 

1. Now that you mention, I can see that the within array variability should be smaller then the technical variability, 
but I cannot understand why treating them as the same, should be less statistically valid then averaging the duplicate spots.
How could one judge what is statistically more valid? Could you maybe tell me where I could read  more about this, 
so I will know more and I won't make the same mistakes again?

2, I should have probably mentioned before,  the correlation between my duplicate spots (calculated with duplicateCorelation function in Limma)
is in the range (0.5,0.6), and the correlation between my technical replicates is in the range (-0.3, -0.2). 
So I think the duplicates spots are not well correlated, and averaging them we will lose valuable information.
If I do averaging of the spots, should I do it before or after normalization?

3. I already have qRT-PCR results for several genes available, and I compared those results with the results that I got by using different methods 
(averaging the duplicate spots, treating biological as technical replicates, or treating technical as biological replicates).  
Every time I got the worst results when I treated technical as biological, and the best when I treated biological as technical replicates
(in biological as technical, I used duplicateCorrelation function on the duplicate spots, but I didn't use blocks for the biological replicates in the lmFit function). 
I thought that by finding the closest results (from stat analysis) to the qRT-PCR results, I will find the most statistically valid method.. Shouldn't this  be true?

4. Now back to my original question. Is ti possible to split a microarray into two peaces and treat each peace as a separate microarray for the sake of analysis?
If it is, how could I do it?

Thank you very much in advance, 

Best regards, 
Ana

________________________________________
From: Naomi Altman [naomi at stat.psu.edu]
Sent: Monday, February 01, 2010 10:11 PM
To: Staninska, Ana, Dr.; bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] FW: duplicates, technical and biological replicates  + dividing a microarray into two parts

It is probably better to average the 2 spots on
the same array.  The within array variability is
less than the technical replication variability
but will be treated  as if it were the same if
you follow your proposal.  The analysis of spot
averages is more statistically valid.

Regards
Naomi Altman

At 10:50 AM 2/1/2010, Staninska, Ana, Dr. wrote:
>Dear BioConductor team,
>
>
>I am working on a statistical analysis of 2
>color Genepix data. For this analysis, I am using the Limma package.
>I am trying to find expressed genes in the
>treated samples vs the non-treated samples,
>where each sample is a leaf from a particular tree.
>I have 5 biological replicates (5 treated and 5
>untreated), and for each biological replicates-
>-- 2 technical replicates, which are dye swapped
>arrays (10 microarrays total).
>The spots on each microarray are duplicated: on
>the left side and on the right side
>
>So...The experiment design involves 3 kinds of
>replicates: within array replicate (duplicate
>spots), technical replicates (as dye swap), and biological replicates.
>I know that at the moment it is not possible
>with limma to treat this kind of experiment, but
>I have an idea how to avoid duplicate spots
>(within array replicates), if that is possible.
>Here I need your help.
>
>There are 8x4 (8 rows, 4 columns) print tip
>groups on the microarrays, and each print tip
>group is of size 12x8 (but i think that is not relevant for now).
>The experiment is designed such that the left
>hand side of the microarray and the right hand
>side are identical. Basically the duplicate
>spots are spotted on the left and on the right
>hand side of the array (if the blocks are
>numbered 1 through 32, then 1 and 3 are same, 2
>and 4, 5 and 7, 6 and 8 etc….) .
>So if somehow I can divide my microarray into
>two peaces and treat the peaces as two separate
>microarrays, then I will be able to avoid the
>duplicate spots, and only deal with technical and biological replicates.
>So if my original microarray consists of 32
>blocs (print tip groups), I would like the two
>new microarrays, called Left_microarray and
>Right_microarray each to contain 16 blocks, such that the blocks
>1,2,5,6,9,10,13,14,17,18,21,22,25,26,29,30 to be
>in Left_microarray and the remaining blocks
>3,4,7,8,11,12,15,16,19,20,23,24,27,28,31,32 to be in the Right microarray.
>Is this possible?
>If it is, could you please help me and tell me how to do this?
>
>Just in case, I am also sending my R code for the experiment.
>
>Thank you very much in advance
>Ana Staninska
>
>Institute of Biomathematics and Biometry
>Helmholtz-Zentrum München
>München, Deutschand
>
>
>
>The R-code of the experiment:
>I tried all the possible cases to deal with the
>experiment: averaging the within array
>replicates, treating biological as technical
>replicates, or treating technIcal as biological replicates.
>After I ran the R code, I compared the results
>with the qRT-PCR results previously done for the
>experiments. The comparison was done such that I
>took the sum of the absolute values of the
>subtraction of log FC form qRT-PCR and logFC from my analysis.
>  It turned out that treating technical as
> biological replicates was the worst
> possibility, but treating biological as technical replicates was the best.
>
> > targets <- readTargets("Lysi_270706.txt")
> >
> > myfun<-function(x) {
>+  nored<-abs(x[,"F635 Median"] + x[,"F635 Mean"]) !=0
>+  nogreen<-abs(x[, "F532 Median"]+x[,"F532 Mean"]) !=0
>+  as.numeric(nogreen & nored)
>+  }
> >
> > RGa <- read.maimages(targets,
> source="genepix",
> wt.fun=myfun,  other.columns=c("F635 SD","B635
> SD","F532 SD","B532 SD","B532 Mean","B635 Mean","F Pixels","B Pixels"))
>Read Met270706_1_60308.gpr
>Read Met270706_dw1_110308.gpr
>Read Met270706_2_060308.gpr
>Read Met270706_dw2_110308.gpr
>Read Met270706_3_060308.gpr
>Read Met270706_dw3_120308.gpr
>Read Met270706_4_060308.gpr
>Read Met270706_dw4_120308.gpr
>Read Met270706_5_060308.gpr
>Read Met270706_dw5_120308.gpr
>Read Met270706_6_060308.gpr
>Read Met270706_dw6_120308.gpr
>Read Met270706_7_110308.gpr
>Read Met270706_dw7_120308.gpr
>Read Met270706_8_220408.gpr
>Read Met270706_dw8_120308.gpr
>Read Met270706_9_110308.gpr
>Read Met270706_dw9_120308.gpr
>Read Met270706_10_110308.gpr
>Read Met270706_dw10_120308.gpr
> >
> > RG.ne10b <-backgroundCorrect(RGa,
> method="normexp", , normexp.method="mle", offset=10)
>Green channel
>Corrected array 1
>Corrected array 2
>Corrected array 3
>Corrected array 4
>Corrected array 5
>Corrected array 6
>Corrected array 7
>Corrected array 8
>Corrected array 9
>Corrected array 10
>Corrected array 11
>Corrected array 12
>Corrected array 13
>Corrected array 14
>Corrected array 15
>Corrected array 16
>Corrected array 17
>Corrected array 18
>Corrected array 19
>Corrected array 20
>Red channel
>Corrected array 1
>Corrected array 2
>Corrected array 3
>Corrected array 4
>Corrected array 5
>Corrected array 6
>Corrected array 7
>Corrected array 8
>Corrected array 9
>Corrected array 10
>Corrected array 11
>Corrected array 12
>Corrected array 13
>Corrected array 14
>Corrected array 15
>Corrected array 16
>Corrected array 17
>Corrected array 18
>Corrected array 19
>Corrected array 20
> >
> > MA_l.ne10b <- normalizeWithinArrays(RG.ne10b, method="loess")
> >
> > #################################################################
> > ###               Average of the Duplicate Spots             ###
> > #################################################################
> >
> > MAa_l.ne10b <- avedups(MA_l.ne10b, ndups=2, spacing=192)
> > design <- modelMatrix(targets, ref="wt")
>Found unique target names:
>  mu wt
> > biolrep<-c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10)
> >
> > corfita_l.ne10b<-duplicateCorrelation(MAa_l.ne10b, design, block=biolrep)
> >
> > fita_l.ne10b<-lmFit(MAa_l.ne10b, design,
> block=biolrep, cor=corfita_l.ne10b$consensus)
> >
> > fita_l.ne10b<-eBayes(fita_l.ne10b)
> >
> > TTa_l.ne10b<-topTable(fita_l.ne10b,coef=1,  number=1600, adjust="BH")
> > write.csv(TTa_l.ne10b, file="BC_Lysi_270706a_TTa_l_ne10b.csv")
> >
> > ################################################################
> > ###        BIOLOGICAL AS TECHNICAL                        ######
> > ################################################################
> >
> > corfit_l.ne10b<-duplicateCorrelation(MA_l.ne10b, ndups=2, spacing=192)
> >
> > fitbt_l.ne10b<-lmFit(MA_l.ne10b,  design,
> ndups=2, spacing=192, cor=corfit_l.ne10b$consensus)
> >
> > fitbt_l.ne10b<-eBayes(fitbt_l.ne10b)
> >
> > TTbt_l.ne10b<-topTable(fitbt_l.ne10b,coef=1,  number=1600, adjust="BH")
> > write.csv(TTbt_l.ne10b, file="BC_Lysi_270706a_TTbt_l_ne10b.csv")
> >
> > ###############################################################
> > ####     TECNICAL AS BIOLOGICAL                           ####
> > ###############################################################
> >
> >
> >
> > design1<-cbind(
>+          nt1=c( 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
>+          tr1=c(0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
>+          nt2=c(0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
>+          tr2=c(0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
>+          nt3=c(0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
>+          tr3=c(0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
>+          nt4=c(0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0),
>+          tr4=c(0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0,0,0),
>+          nt5=c(0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,0),
>+          tr5=c(0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0,0,0),
>+          nt6=c(0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0,0,0),
>+          tr6=c(0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0,0),
>+          nt7=c(0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0,0),
>+          tr7=c(0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0),
>+          nt8=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0),
>+          tr8=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0,0,0),
>+          nt9=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0,0,0),
>+          tr9=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1,0,0),
>+          nt10=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 1,0),
>+          tr10=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, -1))
> >
> > fittb_l.ne10b<-lmFit(MA_l.ne10b,
> design1,  ndups=2, spacing=192,cor=corfit_l.ne10b$consensus)
>Warning message:
>Partial NA coefficients for 160 probe(s)
> >
> > fittb_l.ne10b<-eBayes(fittb_l.ne10b)
> >
> > TTtb_l.ne10b<-topTable(fittb_l.ne10b,coef=1,  number=1600, adjust="BH")
> > write.csv(TTtb_l.ne10b, file="BC_Lysi_270706a_TTtb_l_ne10b.csv")
> >
>
>
>Ana Staninska
>
>Institute of Biomathematics and Biometry
>Helmholtz-Zentrum München
>München, Deutschand
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111