[BioC] Using Limma for peptide array analysis - technical replicate issues and false positives

Fri May 28 06:52:39 CEST 2010

Dear Kate,

> Date: Sun, 23 May 2010 14:13:20 +0100
> From: K.Z.Nambiar at bsms.ac.uk
> To: <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] Using Limma for peptide array analysis - technical
> 	replicate	issues and false positives
>
> Hi everyone,
>
> I am using Limma for analysis of peptide microarrays but have run into a
> few issues and I was hoping to get some advice from the forum...
>
> Basically the experimental design is a comparison of antibody binding
> patterns between a disease state and healthy subjects. The peptide
> arrays are incubated with serum from either of the two groups and then a
> secondary labelling stage is carried out using fluorescently labelled
> anti-human-IgG-Cy5 and anti-human-IgA-Cy3. I'm using a Genepix scanner
> and after scanning I end up with a 2 colour GPR file. The arrays are
> printed with 3 identical sub-arrays per slide.
>
> This is rather different to the schema of 2 colour DNA arrays where one
> is interested in comparing the Red channel to the Green channel. Here
> I'm interested in a comparison of the red channels between arrays and
> the green channels between arrays. So essentially the design is that of
> a 2 single colour arrays performed in parallel.

Seems a waste of the 2 colour arrays format.  I would have thought the
same principles would apply in this respect.  Would it be possible to
design experiments so that you can make comparisons between channels?

> I conduct the analysis as follows:
>
> 1. Read a RGlist object from the gpr files and flag "bad" spots - identified during scanning.
> 2. Define a design matrix - this example is for 3 disease arrays compared to 3 normals
> 3. Filter out the control spots / empty spots etc.
> 4. Perform a quantile normalisation between arrays of the red and green channel data in 
parallel (The
> value that is being normalised is the log2 foreground / background ratio - this is in keeping
with other
> studies looking at peptide array analysis)

As you probably already know, this disagrees with background correction
for DNA microarrays.  It would be good to plot the background corrected
data to check for any possible problems.  Do the arrays include negative
control probes for example to help with background correction?

> 5. Fit the linear model taking into account the 3 duplicate spots per array.
> 6. Apply empirical Bayses statistics and use topTable to generate a list of differentially
identified
> peptides for the red channel (IgG) and green channel (IgA) respectively.
>
>
> This is an example of the script I'm using...
>
> library(limma)
> setwd("F:/1510/GPR")
>
> f <- function(x) as.numeric(x$Flags > -99)
> targets <- readTargets("targets.txt")
>
> RG <- read.maimages (targets$FileName, source="genepix", columns=list(R="F635 Median", G="F532
Median", Rb="B635 Median", Gb="B532 Median"), wt.fun=f)
>
> pData <- data.frame(population = c('disease', 'disease', 'disease', 'norm', 'norm', 'norm'))
> rownames(pData) <- RG$targets$FileName
> design <- model.matrix (~factor(pData$population))
> peptides<-grep("BAC1|BAC2|BAC3", RG$genes$Name)
> RG.final<-RG[peptides, ]
>
> RNorm<-normalizeBetweenArrays(log(RG.final$R,2)/log(RG.final$Rb,2), method="quantile")
> GNorm<-normalizeBetweenArrays(log(RG.final$G,2)/log(RG.final$Gb,2), method="quantile")
>
> rownames(RNorm) <- RG.final$genes$Name
> rownames(GNorm) <- RG.final$genes$Name
> RNormSort <- RNorm[order(rownames(RNorm)), ]
> GNormSort <- GNorm[order(rownames(GNorm)), ]
>
> corfitR <- duplicateCorrelation(RNormSort, design, ndups=3)
> fitR <- lmFit(RNormSort, design, ndups=3, correlation=corfitR$consensus)
>
> corfitG <- duplicateCorrelation(GNormSort, design, ndups=3)
> fitG <- lmFit(GNormSort, design, ndups=3, correlation=corfitG$consensus)
>
> ebayesR <- eBayes(fitR)
> ebayesG <- eBayes(fitG)
>
> topTable(ebayesR, coef = 2, adjust = "fdr", n = 50)
> topTable(ebayesG, coef = 2, adjust = "fdr", n = 50)
>
>
> The questions I have are:
>
> 1. I incuabted a technical replicate for each of the arrays in this
> series. They're not included in this analysis - all the above targets
> are biological replicates. However there are are already intra-array
> replicates accounted for. I seem to remember reading somewhere that
> limma can't handle both types of replicates at the same time. Is that
> still the case? Does anyone know a way round this?

There's usually not a lot of information in the technical replicates that
can't be had by averaging them.  So I recommend you average them.

> 2. Other peptide array studies have removed false positives from the
> analysis. The false positives can be identified by incubating the array
> with just the secondary fluorescent antibodies and no patient sera. Thus
> any peptides that are identified will represent non-specific binding of
> the secondary to the peptide. My thoughts are that since I'm looking for
> diffentially identified peptides the false positives should be positive
> between both groups... However I wonder whether I could analyse a
> dataset of secondary antibody only arrays and use the data from that to
> filter the above data? Has anyone an idea how I might do this?

Don't know.  Of course the limma statistical analysis is already designed
to give an estimate of the FDR.

Best wishes
Gordon

> Finally I'd be interested to hear if you think I'm going about this
> analysis in a reasonable way. There isn't all that much written about
> analysing peptide arrays using R so any suggestions would be really
> welcome.
>
> Many thanks,
>
> Kate
>
>
>
> __________________________________________________
> Dr. Kate Nambiar
> Research Fellow in Infectious Diseases
> Brighton and Sussex University Hospitals NHS Trust
> Eastern Road, Brighton, BN2 5BE
>
> Email  k.z.nambiar at bsms.ac.uk
> Tel +44 (0) 1273 696955 Ext 3900
> Fax +44 (0) 1273 664375

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}