[BioC] Single-channel GenePix analysis with Limma (bg correction, normalization, weights)

Gordon K Smyth smyth at wehi.EDU.AU
Sat Nov 16 08:18:54 CET 2013


On Fri, 15 Nov 2013, Marcin Jakub Kamiński wrote:

> Hello Gordon,

> thank you for your support. Somehow I didn't pay enough attention and 
> thought that 'green.only' parameter shouldn't be used in my case. Now, 
> with Elist, the analysis is much easier, thank you!
>
> However, there's a small inconvenience: I cannot simply plotDensities() 
> for EList, because it expects $R. There's no derivative function, such 
> as plotDensities.EList.

True.

> I workaround it easily by $R <- $E, but maybe 
> that's done on purpose and I shouldn't use these plots in my case?

No, there is no good reason why plotDensities() isn't implemented for 
EList objects.  We just haven't done so as it the need wasn't urgent.

> I've followed your suggestion about bg correction and normalization method
> and normexp+quantile gives me the lowest variance between replicates, as
> well as produces definitely more symmetric volcanoplot. I guess I'll stick
> to these methods, thank you.
> What's interesting, from the other hand, microRNA that has been reported
> functional in vivo based on the consecutive research (hsa-miR-150), was
> ranked higher by vsn method (although it's also present among the first 20
> results from quantile normalization).
>
> Shame to admit, but I didn't perform avereps() before. Now I'm averaging
> across probes, as you've suggested to do in the other mailing list topic.

Actually my intention was to advise against routine use of avereps():

https://www.stat.math.ethz.ch/pipermail/bioconductor/2013-November/056061.html

Best wishes
Gordon

> Also, thank you for the paper, I believe it's of a great value. Unluckily,
> my university doesn't provide free access to the most recent RNA articles,
> we have 1yr embargo.
>
> Best regards,
> Marcin
>
> On Tue, Nov 12, 2013 at 11:42 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Hi Marcin,
>>
>> Normalizing of miRNA microarrays is quite a bit trickier than with mRNA
>> microarrays, because of the smaller number of probes and the higher
>> proportion that are differentially expressed.  See
>>
>>   http://rnajournal.cshlp.org/content/early/2013/05/24/rna.035055.112
>>
>> Best wishes
>> Gordon
>>
>>
>>
>> On Tue, 12 Nov 2013, Gordon K Smyth wrote:
>>
>>  Hi Marcin,
>>>
>>> It doesn't need to be quite so complicated.
>>>
>>> limma reads the single-channel data directly, so there is no need to make
>>> fake double-channel data.  See Section 4.5 of the User's Guide, except that
>>> you will use source="genepix" instead of source="agilent".
>>>
>>> There is no need to specify the read columns explicitly.  Just use
>>> source="genepix.custom" if you want to use the custom background correction
>>> column.
>>>
>>> Section 16.4 gives one case study with Agilent arrays showing the sort of
>>> background correction and normalization that I would recommend.  There is
>>> no need to lose any annotation information.  There is usually no need to
>>> make complicated filters or weights.
>>>
>>> There is no difficulty with setting spot types with single channel data.
>>> It works just the same as for two colour data.
>>>
>>> If you start a new R session with limma only loaded, or else load limma
>>> last, then the need to prefix limma::plotMA will go away as well.
>>>
>>> Hope this helps.
>>> Gordon
>>>
>>>
>>>  Date: Mon, 11 Nov 2013 01:39:19 +0100
>>>> From: Marcin Jakub Kami?ski <marcinjakubkaminski at gmail.com>
>>>> To: bioconductor at r-project.org
>>>> Subject: [BioC] Single-channel GenePix analysis with Limma (bg
>>>>         correction, normalization, weights)
>>>>
>>>> Dear experts,
>>>> I'm trying to analyze raw data from GEO GSE34571 using limma.
>>>> It's 8 single-channel Agilent microarrays scanned with Genepix software
>>>> to
>>>> .gpr files.
>>>> It's a simple design of between-group comparison including two groups, 4
>>>> replicates in each.
>>>>
>>>> With some help from limma guide and previous mailing list entries, my
>>>> pipeline includes:
>>>> - setting function for weights
>>>> - reading .gpr files as fake double-channel data
>>>> - normexp background correction
>>>> - deleting duplicated channels (fake green)
>>>> - vsn normalization
>>>> - lm fitting
>>>>
>>>> There are two major issues (at least I find them difficult):
>>>> A. background correction/data quality
>>>> B. data I loose after performing vsn normalization or because of having
>>>> single-channel data only
>>>>
>>>> A.
>>>> 1. Imageplots indicate spots of high-background signal. Should I be
>>>> concerned about such noise? (present on imageplot.1.png - odd rows depict
>>>> signal, even rows are for background intensities)
>>>> 2. Even after applying background correction, I'm left with high
>>>> intensities in the mentioned spots. Is it how background correction
>>>> should
>>>> work, I've chosen the wrong method, or such spots can't be corrected?
>>>> (present on imageplot.2.bgcor.png)
>>>> 3. MAplots show big differences in within-group expression (technical or
>>>> biological replicates), even after normalization. Did I choose a wrong
>>>> method?
>>>> (as seen in maplot.1.png - first column is for arrays c(1,4), second for
>>>> c(5,8); consecutive rows are for: raw data, background-corrected data,
>>>> normalized data)
>>>> 4. I think the above leads to the MAplot of beetween-group difference
>>>> being
>>>> skewed upwards for high intensities, am I right? (maplot.2.final.png)
>>>> 5. After filtering-out genes with weak SNR and flags (see code),
>>>> within-group fold-changes are considerably smaller. How should i decide
>>>> which genes should be left for analysis: are there any standards or
>>>> should
>>>> I try 'trial-and-error plotting' of MAplots with different functions for
>>>> weights? (maplot.3.filtered.png - upper plot presents filtered data)
>>>> 6. Is there any reason to perform background correction, if it worsens
>>>> densities (plotdensities.png - upper plot was before background
>>>> correction)
>>>>
>>>> B.
>>>> 7. VSN normalization doesn't take $weights into account. Is there any
>>>> convenient way to include them? It also trims genes names, so I set them
>>>> as
>>>> rownames to be processed.
>>>> 8. Is there any convenient way to assign '0 weights' to certain location
>>>> on
>>>> the microarray (such as previously mentioned high-intensity
>>>> spots/scratches)?
>>>> 9. I'm unable to plotMA(RG) with spottypes included, because my data is
>>>> single-channel. Now I think about transferring four last arrays to be
>>>> treated as green channel, but won't it affect the further analysis?
>>>> 10. Due to inability to produce imageplots, I needed to set
>>>> RG$printer$ngrid.r = 1 . Wouldn't it affect the plots reliability?
>>>>
>>>> If you consider any of those questions too basic, could you please
>>>> provide
>>>> me with a reliable online materials for basic microarray analysis? I'm
>>>> trying to figure it out by myself.
>>>>
>>>> Source: http://pastebin.com/ZF2NJc7G
>>>> Plots: http://imgur.com/a/I9k5C
>>>> Session info: http://pastebin.com/kVRf2NWf
>>>>
>>>> Thank you for your support,
>>>> Marcin Kaminski, medical student
>>>> Medical University of Bialystok
>>>> -------------- next part --------------
>>>> R version 3.0.2 (2013-09-25)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250
>>>> [3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
>>>> [5] LC_TIME=Polish_Poland.1250
>>>>
>>>> attached base packages:
>>>> [1] parallel  stats     graphics  grDevices utils     datasets
>>>> [7] methods   base
>>>>
>>>> other attached packages:
>>>> [1] vsn_3.30.0         Biobase_2.22.0     BiocGenerics_0.8.0
>>>> [4] limma_3.18.2
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] affy_1.40.0           affyio_1.30.0         BiocInstaller_1.12.0
>>>> [4] grid_3.0.2            lattice_0.20-24       preprocessCore_1.24.0
>>>> [7] tools_3.0.2           zlibbioc_1.8.0
>>>> -------------- next part --------------
>>>> A non-text attachment was scrubbed...
>>>> Name: imageplot.1.png
>>>> Type: image/png
>>>> Size: 226759 bytes
>>>> Desc: not available
>>>> URL: <https://stat.ethz.ch/pipermail/bioconductor/
>>>> attachments/20131111/a7edc234/attachment-0006.png>
>>>> -------------- next part --------------
>>>> A non-text attachment was scrubbed...
>>>> Name: imageplot.2.bgcor.png
>>>> Type: image/png
>>>> Size: 125957 bytes
>>>> Desc: not available
>>>> URL: <https://stat.ethz.ch/pipermail/bioconductor/
>>>> attachments/20131111/a7edc234/attachment-0007.png>
>>>> -------------- next part --------------
>>>> A non-text attachment was scrubbed...
>>>> Name: maplot.1.png
>>>> Type: image/png
>>>> Size: 25500 bytes
>>>> Desc: not available
>>>> URL: <https://stat.ethz.ch/pipermail/bioconductor/
>>>> attachments/20131111/a7edc234/attachment-0008.png>
>>>> -------------- next part --------------
>>>> A non-text attachment was scrubbed...
>>>> Name: maplot.2.final.png
>>>> Type: image/png
>>>> Size: 10400 bytes
>>>> Desc: not available
>>>> URL: <https://stat.ethz.ch/pipermail/bioconductor/
>>>> attachments/20131111/a7edc234/attachment-0009.png>
>>>> -------------- next part --------------
>>>> A non-text attachment was scrubbed...
>>>> Name: maplot.3.filtered.png
>>>> Type: image/png
>>>> Size: 16349 bytes
>>>> Desc: not available
>>>> URL: <https://stat.ethz.ch/pipermail/bioconductor/
>>>> attachments/20131111/a7edc234/attachment-0010.png>
>>>> -------------- next part --------------
>>>> A non-text attachment was scrubbed...
>>>> Name: plotdensities.png
>>>> Type: image/png
>>>> Size: 5469 bytes
>>>> Desc: not available
>>>> URL: <https://stat.ethz.ch/pipermail/bioconductor/
>>>> attachments/20131111/a7edc234/attachment-0011.png>
>>>> -------------- next part --------------
>>>> library(limma)
>>>> library(vsn)
>>>>
>>>> # DATA PREPROCESSING
>>>> # Download RAW data from GEO experiment
>>>> setInternet2(use=FALSE)
>>>> download.file('http://www.ncbi.nlm.nih.gov/geo/download/
>>>> ?acc=GSE34571&format=file', 'GSE34571_RAW.tar', mode='wb')
>>>> untar('GSE34571_RAW.tar', exdir='gpr')
>>>> setwd('gpr')
>>>>
>>>> # Set function to assign weights
>>>> f <- function(x) {ok.flags <- x$Flags > (-99)
>>>>                  ok.snr <- x[,'SNR 532'] > 3
>>>>                  as.numeric(ok.snr & ok.flags)
>>>> }
>>>>
>>>> # Read single-channel .gpr files as double-channel data and perform bg
>>>> correction
>>>> RG <- read.maimages(dir(pattern='*.gpr.gz'), source="genepix",
>>>>                    columns=list(R="F532 Mean",G="F532
>>>> Mean",Rb="B532",Gb="B532"),
>>>>                    wt.fun=f)
>>>> RG.b = backgroundCorrect(RG=RG, method='normexp')
>>>>
>>>> # Delete fake second channel
>>>> rownames(RG.b$R) <- RG.b$genes$Name
>>>> RG.b$G <- NULL
>>>> RG.b$Gb <- NULL
>>>>
>>>> # Specify contrast and design matrices
>>>> design <- model.matrix(~ 0+factor(c(1,1,1,1,2,2,2,2)))
>>>> colnames(design) <- c("group1", "group2")
>>>> contrast.matrix <- makeContrasts(group1-group2, levels=design)
>>>>
>>>> # Peform normalisation and stats
>>>> norm.vsn <- normalizeVSN(RG.b$R)
>>>> fit.vsn <- lmFit(norm.vsn, design)
>>>> fit2.vsn <- contrasts.fit(fit.vsn, contrast.matrix)
>>>> fit3.vsn <- eBayes(fit2.vsn)
>>>> limma::volcanoplot(fit3.vsn)
>>>>
>>>>
>>>> RG$printer$ngrid.r = 1
>>>> imageplot(log2(RG.b$R[,2]), RG$printer)
>>>>
>>>> # DIAGNOSTIC PLOTS
>>>>
>>>> # maplot.1
>>>> par(mfrow=c(3,2))
>>>> limma::plotMA(log2(RG$R[,c(1:4)]))
>>>> limma::plotMA(log2(RG$R[,c(5:8)]))
>>>>
>>>> limma::plotMA(log2(RG.b$R[,c(1:4)]))
>>>> limma::plotMA(log2(RG.b$R[,c(5:8)]))
>>>>
>>>> limma::plotMA(norm.vsn[,c(1:4)])
>>>> limma::plotMA(norm.vsn[,c(5:8)])
>>>>
>>>> # maplot.2.final
>>>> par(mfrow=c(1,1))
>>>> limma::plotMA(fit3.vsn)
>>>>
>>>> # maplot.3.filtered
>>>> par(mfrow=c(2,1))
>>>> limma::plotMA(log2(RG.b$R[((RG.b$weights[,1] == 1) & (RG.b$weights[,2]
>>>> == 1)),c(1:2)]))
>>>> limma::plotMA(log2(RG.b$R[,c(1:2)]))
>>>>
>>>> boxplot(RG$R)
>>>> boxplot(norm.vsn)
>>>>
>>>> # plotdensities
>>>> plotDensities(RG, singlechannels=c(1:8))
>>>> plotDensities(RG.b, singlechannels=c(1:8))
>>>>
>>>> ------------------------------
>>>>
>>>
>>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
>>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}


More information about the Bioconductor mailing list