[BioC] Single-channel GenePix analysis with Limma (bg correction, normalization, weights)

Tue Nov 12 11:18:54 CET 2013

Hi Marcin,

It doesn't need to be quite so complicated.

limma reads the single-channel data directly, so there is no need to make 
fake double-channel data.  See Section 4.5 of the User's Guide, except 
that you will use source="genepix" instead of source="agilent".

There is no need to specify the read columns explicitly.  Just use 
source="genepix.custom" if you want to use the custom background 
correction column.

Section 16.4 gives one case study with Agilent arrays showing the sort of 
background correction and normalization that I would recommend.  There is 
no need to lose any annotation information.  There is usually no need to 
make complicated filters or weights.

There is no difficulty with setting spot types with single channel data. 
It works just the same as for two colour data.

If you start a new R session with limma only loaded, or else load limma 
last, then the need to prefix limma::plotMA will go away as well.

Hope this helps.
Gordon

> Date: Mon, 11 Nov 2013 01:39:19 +0100
> From: Marcin Jakub Kami?ski <marcinjakubkaminski at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] Single-channel GenePix analysis with Limma (bg
> 	correction, normalization, weights)
>
> Dear experts,
> I'm trying to analyze raw data from GEO GSE34571 using limma.
> It's 8 single-channel Agilent microarrays scanned with Genepix software to
> .gpr files.
> It's a simple design of between-group comparison including two groups, 4
> replicates in each.
>
> With some help from limma guide and previous mailing list entries, my
> pipeline includes:
> - setting function for weights
> - reading .gpr files as fake double-channel data
> - normexp background correction
> - deleting duplicated channels (fake green)
> - vsn normalization
> - lm fitting
>
> There are two major issues (at least I find them difficult):
> A. background correction/data quality
> B. data I loose after performing vsn normalization or because of having
> single-channel data only
>
> A.
> 1. Imageplots indicate spots of high-background signal. Should I be
> concerned about such noise? (present on imageplot.1.png - odd rows depict
> signal, even rows are for background intensities)
> 2. Even after applying background correction, I'm left with high
> intensities in the mentioned spots. Is it how background correction should
> work, I've chosen the wrong method, or such spots can't be corrected?
> (present on imageplot.2.bgcor.png)
> 3. MAplots show big differences in within-group expression (technical or
> biological replicates), even after normalization. Did I choose a wrong
> method?
> (as seen in maplot.1.png - first column is for arrays c(1,4), second for
> c(5,8); consecutive rows are for: raw data, background-corrected data,
> normalized data)
> 4. I think the above leads to the MAplot of beetween-group difference being
> skewed upwards for high intensities, am I right? (maplot.2.final.png)
> 5. After filtering-out genes with weak SNR and flags (see code),
> within-group fold-changes are considerably smaller. How should i decide
> which genes should be left for analysis: are there any standards or should
> I try 'trial-and-error plotting' of MAplots with different functions for
> weights? (maplot.3.filtered.png - upper plot presents filtered data)
> 6. Is there any reason to perform background correction, if it worsens
> densities (plotdensities.png - upper plot was before background correction)
>
> B.
> 7. VSN normalization doesn't take $weights into account. Is there any
> convenient way to include them? It also trims genes names, so I set them as
> rownames to be processed.
> 8. Is there any convenient way to assign '0 weights' to certain location on
> the microarray (such as previously mentioned high-intensity
> spots/scratches)?
> 9. I'm unable to plotMA(RG) with spottypes included, because my data is
> single-channel. Now I think about transferring four last arrays to be
> treated as green channel, but won't it affect the further analysis?
> 10. Due to inability to produce imageplots, I needed to set
> RG$printer$ngrid.r = 1 . Wouldn't it affect the plots reliability?
>
> If you consider any of those questions too basic, could you please provide
> me with a reliable online materials for basic microarray analysis? I'm
> trying to figure it out by myself.
>
> Source: http://pastebin.com/ZF2NJc7G
> Plots: http://imgur.com/a/I9k5C
> Session info: http://pastebin.com/kVRf2NWf
>
> Thank you for your support,
> Marcin Kaminski, medical student
> Medical University of Bialystok
> -------------- next part --------------
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250
> [3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
> [5] LC_TIME=Polish_Poland.1250
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets
> [7] methods   base
>
> other attached packages:
> [1] vsn_3.30.0         Biobase_2.22.0     BiocGenerics_0.8.0
> [4] limma_3.18.2
>
> loaded via a namespace (and not attached):
> [1] affy_1.40.0           affyio_1.30.0         BiocInstaller_1.12.0
> [4] grid_3.0.2            lattice_0.20-24       preprocessCore_1.24.0
> [7] tools_3.0.2           zlibbioc_1.8.0
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: imageplot.1.png
> Type: image/png
> Size: 226759 bytes
> Desc: not available
> URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0006.png>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: imageplot.2.bgcor.png
> Type: image/png
> Size: 125957 bytes
> Desc: not available
> URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0007.png>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: maplot.1.png
> Type: image/png
> Size: 25500 bytes
> Desc: not available
> URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0008.png>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: maplot.2.final.png
> Type: image/png
> Size: 10400 bytes
> Desc: not available
> URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0009.png>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: maplot.3.filtered.png
> Type: image/png
> Size: 16349 bytes
> Desc: not available
> URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0010.png>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: plotdensities.png
> Type: image/png
> Size: 5469 bytes
> Desc: not available
> URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0011.png>
> -------------- next part --------------
> library(limma)
> library(vsn)
>
> # DATA PREPROCESSING
> # Download RAW data from GEO experiment
> setInternet2(use=FALSE)
> download.file('http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE34571&format=file', 'GSE34571_RAW.tar', mode='wb')
> untar('GSE34571_RAW.tar', exdir='gpr')
> setwd('gpr')
>
> # Set function to assign weights
> f <- function(x) {ok.flags <- x$Flags > (-99)
>                  ok.snr <- x[,'SNR 532'] > 3
>                  as.numeric(ok.snr & ok.flags)
> }
>
> # Read single-channel .gpr files as double-channel data and perform bg correction
> RG <- read.maimages(dir(pattern='*.gpr.gz'), source="genepix",
>                    columns=list(R="F532 Mean",G="F532 Mean",Rb="B532",Gb="B532"),
>                    wt.fun=f)
> RG.b = backgroundCorrect(RG=RG, method='normexp')
>
> # Delete fake second channel
> rownames(RG.b$R) <- RG.b$genes$Name
> RG.b$G <- NULL
> RG.b$Gb <- NULL
>
> # Specify contrast and design matrices
> design <- model.matrix(~ 0+factor(c(1,1,1,1,2,2,2,2)))
> colnames(design) <- c("group1", "group2")
> contrast.matrix <- makeContrasts(group1-group2, levels=design)
>
> # Peform normalisation and stats
> norm.vsn <- normalizeVSN(RG.b$R)
> fit.vsn <- lmFit(norm.vsn, design)
> fit2.vsn <- contrasts.fit(fit.vsn, contrast.matrix)
> fit3.vsn <- eBayes(fit2.vsn)
> limma::volcanoplot(fit3.vsn)
>
>
> RG$printer$ngrid.r = 1
> imageplot(log2(RG.b$R[,2]), RG$printer)
>
> # DIAGNOSTIC PLOTS
>
> # maplot.1
> par(mfrow=c(3,2))
> limma::plotMA(log2(RG$R[,c(1:4)]))
> limma::plotMA(log2(RG$R[,c(5:8)]))
>
> limma::plotMA(log2(RG.b$R[,c(1:4)]))
> limma::plotMA(log2(RG.b$R[,c(5:8)]))
>
> limma::plotMA(norm.vsn[,c(1:4)])
> limma::plotMA(norm.vsn[,c(5:8)])
>
> # maplot.2.final
> par(mfrow=c(1,1))
> limma::plotMA(fit3.vsn)
>
> # maplot.3.filtered
> par(mfrow=c(2,1))
> limma::plotMA(log2(RG.b$R[((RG.b$weights[,1] == 1) & (RG.b$weights[,2] == 1)),c(1:2)]))
> limma::plotMA(log2(RG.b$R[,c(1:2)]))
>
> boxplot(RG$R)
> boxplot(norm.vsn)
>
> # plotdensities
> plotDensities(RG, singlechannels=c(1:8))
> plotDensities(RG.b, singlechannels=c(1:8))
>
> ------------------------------

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}