[BioC] Single-channel GenePix analysis with Limma (bg correction, normalization, weights)

Tue Nov 12 11:42:13 CET 2013

Hi Marcin,

Normalizing of miRNA microarrays is quite a bit trickier than with mRNA 
microarrays, because of the smaller number of probes and the higher 
proportion that are differentially expressed.  See

   http://rnajournal.cshlp.org/content/early/2013/05/24/rna.035055.112

Best wishes
Gordon

On Tue, 12 Nov 2013, Gordon K Smyth wrote:

> Hi Marcin,
>
> It doesn't need to be quite so complicated.
>
> limma reads the single-channel data directly, so there is no need to make 
> fake double-channel data.  See Section 4.5 of the User's Guide, except that 
> you will use source="genepix" instead of source="agilent".
>
> There is no need to specify the read columns explicitly.  Just use 
> source="genepix.custom" if you want to use the custom background correction 
> column.
>
> Section 16.4 gives one case study with Agilent arrays showing the sort of 
> background correction and normalization that I would recommend.  There is no 
> need to lose any annotation information.  There is usually no need to make 
> complicated filters or weights.
>
> There is no difficulty with setting spot types with single channel data. It 
> works just the same as for two colour data.
>
> If you start a new R session with limma only loaded, or else load limma last, 
> then the need to prefix limma::plotMA will go away as well.
>
> Hope this helps.
> Gordon
>
>
>> Date: Mon, 11 Nov 2013 01:39:19 +0100
>> From: Marcin Jakub Kami?ski <marcinjakubkaminski at gmail.com>
>> To: bioconductor at r-project.org
>> Subject: [BioC] Single-channel GenePix analysis with Limma (bg
>> 	correction, normalization, weights)
>> 
>> Dear experts,
>> I'm trying to analyze raw data from GEO GSE34571 using limma.
>> It's 8 single-channel Agilent microarrays scanned with Genepix software to
>> .gpr files.
>> It's a simple design of between-group comparison including two groups, 4
>> replicates in each.
>> 
>> With some help from limma guide and previous mailing list entries, my
>> pipeline includes:
>> - setting function for weights
>> - reading .gpr files as fake double-channel data
>> - normexp background correction
>> - deleting duplicated channels (fake green)
>> - vsn normalization
>> - lm fitting
>> 
>> There are two major issues (at least I find them difficult):
>> A. background correction/data quality
>> B. data I loose after performing vsn normalization or because of having
>> single-channel data only
>> 
>> A.
>> 1. Imageplots indicate spots of high-background signal. Should I be
>> concerned about such noise? (present on imageplot.1.png - odd rows depict
>> signal, even rows are for background intensities)
>> 2. Even after applying background correction, I'm left with high
>> intensities in the mentioned spots. Is it how background correction should
>> work, I've chosen the wrong method, or such spots can't be corrected?
>> (present on imageplot.2.bgcor.png)
>> 3. MAplots show big differences in within-group expression (technical or
>> biological replicates), even after normalization. Did I choose a wrong
>> method?
>> (as seen in maplot.1.png - first column is for arrays c(1,4), second for
>> c(5,8); consecutive rows are for: raw data, background-corrected data,
>> normalized data)
>> 4. I think the above leads to the MAplot of beetween-group difference being
>> skewed upwards for high intensities, am I right? (maplot.2.final.png)
>> 5. After filtering-out genes with weak SNR and flags (see code),
>> within-group fold-changes are considerably smaller. How should i decide
>> which genes should be left for analysis: are there any standards or should
>> I try 'trial-and-error plotting' of MAplots with different functions for
>> weights? (maplot.3.filtered.png - upper plot presents filtered data)
>> 6. Is there any reason to perform background correction, if it worsens
>> densities (plotdensities.png - upper plot was before background correction)
>> 
>> B.
>> 7. VSN normalization doesn't take $weights into account. Is there any
>> convenient way to include them? It also trims genes names, so I set them as
>> rownames to be processed.
>> 8. Is there any convenient way to assign '0 weights' to certain location on
>> the microarray (such as previously mentioned high-intensity
>> spots/scratches)?
>> 9. I'm unable to plotMA(RG) with spottypes included, because my data is
>> single-channel. Now I think about transferring four last arrays to be
>> treated as green channel, but won't it affect the further analysis?
>> 10. Due to inability to produce imageplots, I needed to set
>> RG$printer$ngrid.r = 1 . Wouldn't it affect the plots reliability?
>> 
>> If you consider any of those questions too basic, could you please provide
>> me with a reliable online materials for basic microarray analysis? I'm
>> trying to figure it out by myself.
>> 
>> Source: http://pastebin.com/ZF2NJc7G
>> Plots: http://imgur.com/a/I9k5C
>> Session info: http://pastebin.com/kVRf2NWf
>> 
>> Thank you for your support,
>> Marcin Kaminski, medical student
>> Medical University of Bialystok
>> -------------- next part --------------
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> 
>> locale:
>> [1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250
>> [3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
>> [5] LC_TIME=Polish_Poland.1250
>> 
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets
>> [7] methods   base
>> 
>> other attached packages:
>> [1] vsn_3.30.0         Biobase_2.22.0     BiocGenerics_0.8.0
>> [4] limma_3.18.2
>> 
>> loaded via a namespace (and not attached):
>> [1] affy_1.40.0           affyio_1.30.0         BiocInstaller_1.12.0
>> [4] grid_3.0.2            lattice_0.20-24       preprocessCore_1.24.0
>> [7] tools_3.0.2           zlibbioc_1.8.0
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: imageplot.1.png
>> Type: image/png
>> Size: 226759 bytes
>> Desc: not available
>> URL: 
>> <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0006.png>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: imageplot.2.bgcor.png
>> Type: image/png
>> Size: 125957 bytes
>> Desc: not available
>> URL: 
>> <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0007.png>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: maplot.1.png
>> Type: image/png
>> Size: 25500 bytes
>> Desc: not available
>> URL: 
>> <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0008.png>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: maplot.2.final.png
>> Type: image/png
>> Size: 10400 bytes
>> Desc: not available
>> URL: 
>> <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0009.png>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: maplot.3.filtered.png
>> Type: image/png
>> Size: 16349 bytes
>> Desc: not available
>> URL: 
>> <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0010.png>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: plotdensities.png
>> Type: image/png
>> Size: 5469 bytes
>> Desc: not available
>> URL: 
>> <https://stat.ethz.ch/pipermail/bioconductor/attachments/20131111/a7edc234/attachment-0011.png>
>> -------------- next part --------------
>> library(limma)
>> library(vsn)
>> 
>> # DATA PREPROCESSING
>> # Download RAW data from GEO experiment
>> setInternet2(use=FALSE)
>> download.file('http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE34571&format=file', 
>> 'GSE34571_RAW.tar', mode='wb')
>> untar('GSE34571_RAW.tar', exdir='gpr')
>> setwd('gpr')
>> 
>> # Set function to assign weights
>> f <- function(x) {ok.flags <- x$Flags > (-99)
>>                  ok.snr <- x[,'SNR 532'] > 3
>>                  as.numeric(ok.snr & ok.flags)
>> }
>> 
>> # Read single-channel .gpr files as double-channel data and perform bg 
>> correction
>> RG <- read.maimages(dir(pattern='*.gpr.gz'), source="genepix",
>>                    columns=list(R="F532 Mean",G="F532 
>> Mean",Rb="B532",Gb="B532"),
>>                    wt.fun=f)
>> RG.b = backgroundCorrect(RG=RG, method='normexp')
>> 
>> # Delete fake second channel
>> rownames(RG.b$R) <- RG.b$genes$Name
>> RG.b$G <- NULL
>> RG.b$Gb <- NULL
>> 
>> # Specify contrast and design matrices
>> design <- model.matrix(~ 0+factor(c(1,1,1,1,2,2,2,2)))
>> colnames(design) <- c("group1", "group2")
>> contrast.matrix <- makeContrasts(group1-group2, levels=design)
>> 
>> # Peform normalisation and stats
>> norm.vsn <- normalizeVSN(RG.b$R)
>> fit.vsn <- lmFit(norm.vsn, design)
>> fit2.vsn <- contrasts.fit(fit.vsn, contrast.matrix)
>> fit3.vsn <- eBayes(fit2.vsn)
>> limma::volcanoplot(fit3.vsn)
>> 
>> 
>> RG$printer$ngrid.r = 1
>> imageplot(log2(RG.b$R[,2]), RG$printer)
>> 
>> # DIAGNOSTIC PLOTS
>> 
>> # maplot.1
>> par(mfrow=c(3,2))
>> limma::plotMA(log2(RG$R[,c(1:4)]))
>> limma::plotMA(log2(RG$R[,c(5:8)]))
>> 
>> limma::plotMA(log2(RG.b$R[,c(1:4)]))
>> limma::plotMA(log2(RG.b$R[,c(5:8)]))
>> 
>> limma::plotMA(norm.vsn[,c(1:4)])
>> limma::plotMA(norm.vsn[,c(5:8)])
>> 
>> # maplot.2.final
>> par(mfrow=c(1,1))
>> limma::plotMA(fit3.vsn)
>> 
>> # maplot.3.filtered
>> par(mfrow=c(2,1))
>> limma::plotMA(log2(RG.b$R[((RG.b$weights[,1] == 1) & (RG.b$weights[,2] == 
>> 1)),c(1:2)]))
>> limma::plotMA(log2(RG.b$R[,c(1:2)]))
>> 
>> boxplot(RG$R)
>> boxplot(norm.vsn)
>> 
>> # plotdensities
>> plotDensities(RG, singlechannels=c(1:8))
>> plotDensities(RG.b, singlechannels=c(1:8))
>> 
>> ------------------------------
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}