[BioC] limma for homemade microarray - question on NAs and multiple probes for one gene

Gordon K Smyth smyth at wehi.EDU.AU
Sat Jul 13 01:20:51 CEST 2013


Dear Zhengyu,

Please keep questions on the list.

Results show that pair 6 is different from the others, and that there 
isn't a consistent treatment effect.  I suggest that you explore your data 
data more:

   plotMDS(eset)

The SvsA plot shows that the log-expression values are less variable at 
both low and high intensities.  This has presumably arisen from the 
microarray platform and from the way that you have pre-processed the data, 
which you don't tell us much about.  In any case, limma has taken the 
trend into account.

Since this a single-channel platform, your comment "take log ratios" is a 
bit mysterious.  Do you just mean log-intensity?

Best wishes
Gordon

On Fri, 12 Jul 2013, zhengyu jiang wrote:

> Hi Gordon,
> Thank you very much for your reply. Sorry for the delay.
>
> Following your instructions, I started with original matrix (M) of raw
> intensities & made the codes below.
> If everything is correct, my questions are (1) does the figure (below) log
> residual standard deviation vs. log expression make sense? - how should I
> decide if it is normal?
> (2) does the summary(decideTest) tell me that only pair 6 has great changes
> in expression?
>
> Thanks,
>
> eset<-as.matrix(log(M)) #### take log ratios
> eset <- normalizeBetweenArrays(eset, method="quantile") ## quantile
> normalization for all arrays
> design<-model.matrix(~Pair+Treat)
> targets<-readTargets("targets.txt",row.names=1)  ## or row name =1
> Pair<-factor(targets$Pair)
> Treat<-factor(targets$Treatment,levels=c("N","T"))
> fit_pair<-lmFit(eset,design)
> plotSA(fit_pair)
> fit= eBayes(fit_pair, trend=TRUE)
>> summary(decideTests(fit))
>   (Intercept)  Pair2  Pair3  Pair4  Pair5  Pair6  Pair7  Pair8 TreatT
> -1           0      0      0      0      0   2770      0      0      0
> 0            0 257279 257279 257278 257275 254372 257275 257273 257279
> 1       257279      0      0      0      0    102      0      2      0
>> R=topTable(fit, coef="TreatT", adjust="BH",number=30
> + )
>> head(R)
>            logFC  AveExpr         t      P.Value adj.P.Val         B
> 207370  0.6232462 6.473489  5.990300 6.818613e-05 0.8997989 -3.715276
> 45481   0.5559283 6.477330  4.960299 3.494802e-04 0.8997989 -3.822746
> 178158  0.4404808 6.148200  4.872321 4.046088e-04 0.8997989 -3.833704
> 172530 -0.3670701 5.793056 -4.800323 4.564952e-04 0.8997989 -3.842908
> 165527  0.4721218 6.324581  4.740798 5.046498e-04 0.8997989 -3.850680
> 80386  -0.5450653 6.551971 -4.723036 5.200275e-04 0.8997989 -3.853028
>
>
>
>
> [image: Inline image 1]
>
>
> On Tue, Jul 9, 2013 at 7:34 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Zhengyu,
>>
>>  Date: Mon, 8 Jul 2013 20:40:14 -0400
>>> From: zhengyu jiang <zhyjiang2006 at gmail.com>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] limma for homemade microarray - question on NAs and
>>>         multiple probes for one gene
>>>
>>> Dear Bioconductor experts,
>>>
>>> We have data from a homemade one-channel microarray that I tried to apply
>>> limma for differential expression analysis between matched paired Normal
>>> (N) and Tumor (Tumor) samples - 8 biological replicates (one tech
>>> replicate
>>> has been averaged after normalization). All samples are formatted in one
>>> matrix (M).
>>>
>>
>> This is a standard problem, well covered in the limma User's Guide.
>>
>>  Signals have been quantile normalized between each paired normal and
>>> tumor.
>>>
>>
>> I don't know what you mean by this.  I recommend ordinary quantile
>> normalization of all the arrays together, without regard to which sample is
>> paired with which.
>>
>>  Signal values below 5 (log scale) have been replaced by "NA" since they
>>> are
>>> potentially noises. So there are many NAs in M.
>>>
>>
>> Please don't do this.  limma can deal with low intensities perfectly weel,
>> and can figure out whether they are noise or not.  You are simply removing
>> valid data.
>>
>> If you are concerned about high variability of low intensity probes, then
>> look at
>>
>>   plotSA(fit_pair)
>>
>> to examine the mean-variance trend, and use
>>
>>   eBayes(fit_pair, trend=TRUE)
>>
>> if there is a strong trend.
>>
>>  I followed the user manual and made the codes below.
>>>
>>> I think the code is correct?
>>>
>>
>> Looks ok.
>>
>>  My questions are (1) how to deal with NAs - as I did a search but no
>>> clear idea
>>>
>>
>> Don't create them in the first place.  This has been said many times
>> before!
>>
>>  (2) how do people do the statistics at the gene level for one gene having
>>> multiple probes - averaging or taking median?
>>>
>>
>> Usually we do not find it necessary to aggregate the multiple probes. The
>> multiple probes might represent different transcripts or variants, and some
>> of the probes might not work.  Why do you need to take any special action?
>>
>> However, if you feel that you must, the best method to aggregate the
>> multiple probes is probably to retain the probe for each gene with highest
>> fit_pair$Amean value.  We have used this strategy in a number of published
>> papers.  The rationale for this is to choose the probe corresponding to the
>> most highly expressed transcript for the gene.  Alternatively, you could
>> average the probes using the avereps() function in limma.
>>
>> Best wishes
>> Gordon
>>
>>  Thanks,
>>> Zhengyu
>>>
>>>
>>>> head(M)
>>>         N1       N2       N3       N4       N5       N6       N7
>>> N8       T1        T2       T3
>>> 2  8.622724 7.423568       NA       NA 7.487174       NA 8.516293       NA
>>> 7.876259  7.856707       NA
>>>         T4       T5       T6       T7       T8
>>> 2        NA 7.720018       NA 7.752550       NA
>>>
>>>  eset<-as.matrix(M)
>>>> Pair=factor(targets$Pair)
>>>>     Treat=factor(targets$**Treatment,levels=c("N","T")) # compared
>>>> matched
>>>>
>>> normal to tumors
>>>
>>>>               design<-model.matrix(~Pair+**Treat)
>>>> targets
>>>>
>>>   FilenName Pair Treatment
>>> 1         N1    1         N
>>> 2         N2    2         N
>>> 3         N3    3         N
>>> 4         N4    4         N
>>> 5         N5    5         N
>>> 6         N6    6         N
>>> 7         N7    7         N
>>> 8         N8    8         N
>>> 9         T1    1         T
>>> 10        T2    2         T
>>> 11        T3    3         T
>>> 12        T4    4         T
>>> 13        T5    5         T
>>> 14        T6    6         T
>>> 15        T7    7         T
>>> 16        T8    8         T
>>> fit_pair<-lmFit(eset,design)
>>>             fit_pair<-eBayes(fit_pair)
>>>
>>> R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top
>>> 30

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list