[BioC] limma for homemade microarray - question on NAs and multiple probes for one gene

Gordon K Smyth smyth at wehi.EDU.AU
Thu Jul 25 01:21:39 CEST 2013


Dear Zhengyu,

On Mon, 22 Jul 2013, zhengyu jiang wrote:

> Hi Gordon,
>
> Sorry to bother you again.  I have two questions.
>
> (1) I started the "eset<-as.matrix(log(M)) #### take log intensities" 
> with a matrix and I have a separate annotation file which contains 
> columns of "ID", "Sample Name", "Chr", "Coordinate", ""GeneSymbol". 
> The following code to top list is below.  How can I add the annotation 
> to the final toplist output ?

The first and best way to answer questions is to read the documentation. 
If you type ?topTable, you will see that there is an argument for a 
data.frame containing gene annotation.  So you read the annotation into a 
data.frame, then use

   topTable(fit, ..., genelist=yourdatannotation)

> eset <- normalizeBetweenArrays(eset, method="quantile") ## quantile
> normalization for all arrays
> design<-model.matrix(~Pair+Treat)
> targets<-readTargets("targets.txt",row.names=1)  ## or row name =1
> Pair<-factor(targets$Pair)
> Treat<-factor(targets$Treatment,levels=c("N","T"))
> fit_pair<-lmFit(eset,design)
> plotSA(fit_pair)
> fit= eBayes(fit_pair, trend=TRUE)
> R=topTable(fit, coef="TreatT", adjust="BH",number=30
>
> (2) I have 6 sample data (3 control and 3 treatment) in one file.

Is this an Illumina BeadChip.  You don't say but, if not, using read.ilmn 
is not appropriate.

> It contains columns of "ID", "Sample Name", "Chr", "Coordinate", 
> ""GeneSymbol", "XRaw" (expression raw data). I don't have control 
> probes. If I want to use the Spot quality weights function "wt.fun" to 
> read in these data as described in the limma manual to define all XRaw 
> values below 1000 as 0 weight,

And yet I have already asked you please not to do this.

> is that possible to modify the code? I can change all column names if 
> needed. But I don't know what columns are required for read.illmn?

Again, please read the documentation.  Type ?read.ilmn and you will see 
that there is no argument called wt.fun.  You can only use read.ilmn for 
Illumina BeadChips.  This type of array does not have spots, so trying to 
set spot weights would obviously make no sense.

The Biconductor posting guide says

"Read the relevant R documentation ... If you are having trouble with 
function somefunc, try ?somefunc."

The limma User's Guide says

"Mailing list etiquette requires that you read the relevant help page 
carefully before posting a problem to the list."

Best wishes
Gordon

> myfun <- function(x) as.numeric(x$XRaw <1000)
> read.ilmn("062813_data.txt",probeid="ID",annotation=c("Chr","Coordinate","GeneSymbol",expr="XRaw"),
> wt.fun=myfun)
> Thanks,
> Zhengyu
>
>
> On Tue, Jul 9, 2013 at 7:34 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Zhengyu,
>>
>>  Date: Mon, 8 Jul 2013 20:40:14 -0400
>>> From: zhengyu jiang <zhyjiang2006 at gmail.com>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] limma for homemade microarray - question on NAs and
>>>         multiple probes for one gene
>>>
>>> Dear Bioconductor experts,
>>>
>>> We have data from a homemade one-channel microarray that I tried to apply
>>> limma for differential expression analysis between matched paired Normal
>>> (N) and Tumor (Tumor) samples - 8 biological replicates (one tech
>>> replicate
>>> has been averaged after normalization). All samples are formatted in one
>>> matrix (M).
>>>
>>
>> This is a standard problem, well covered in the limma User's Guide.
>>
>>  Signals have been quantile normalized between each paired normal and
>>> tumor.
>>>
>>
>> I don't know what you mean by this.  I recommend ordinary quantile
>> normalization of all the arrays together, without regard to which sample is
>> paired with which.
>>
>>  Signal values below 5 (log scale) have been replaced by "NA" since they
>>> are
>>> potentially noises. So there are many NAs in M.
>>>
>>
>> Please don't do this.  limma can deal with low intensities perfectly weel,
>> and can figure out whether they are noise or not.  You are simply removing
>> valid data.
>>
>> If you are concerned about high variability of low intensity probes, then
>> look at
>>
>>   plotSA(fit_pair)
>>
>> to examine the mean-variance trend, and use
>>
>>   eBayes(fit_pair, trend=TRUE)
>>
>> if there is a strong trend.
>>
>>  I followed the user manual and made the codes below.
>>>
>>> I think the code is correct?
>>>
>>
>> Looks ok.
>>
>>  My questions are (1) how to deal with NAs - as I did a search but no
>>> clear idea
>>>
>>
>> Don't create them in the first place.  This has been said many times
>> before!
>>
>>  (2) how do people do the statistics at the gene level for one gene having
>>> multiple probes - averaging or taking median?
>>>
>>
>> Usually we do not find it necessary to aggregate the multiple probes. The
>> multiple probes might represent different transcripts or variants, and some
>> of the probes might not work.  Why do you need to take any special action?
>>
>> However, if you feel that you must, the best method to aggregate the
>> multiple probes is probably to retain the probe for each gene with highest
>> fit_pair$Amean value.  We have used this strategy in a number of published
>> papers.  The rationale for this is to choose the probe corresponding to the
>> most highly expressed transcript for the gene.  Alternatively, you could
>> average the probes using the avereps() function in limma.
>>
>> Best wishes
>> Gordon
>>
>>  Thanks,
>>> Zhengyu
>>>
>>>
>>>> head(M)
>>>         N1       N2       N3       N4       N5       N6       N7
>>> N8       T1        T2       T3
>>> 2  8.622724 7.423568       NA       NA 7.487174       NA 8.516293       NA
>>> 7.876259  7.856707       NA
>>>         T4       T5       T6       T7       T8
>>> 2        NA 7.720018       NA 7.752550       NA
>>>
>>>  eset<-as.matrix(M)
>>>> Pair=factor(targets$Pair)
>>>>     Treat=factor(targets$**Treatment,levels=c("N","T")) # compared
>>>> matched
>>>>
>>> normal to tumors
>>>
>>>>               design<-model.matrix(~Pair+**Treat)
>>>> targets
>>>>
>>>   FilenName Pair Treatment
>>> 1         N1    1         N
>>> 2         N2    2         N
>>> 3         N3    3         N
>>> 4         N4    4         N
>>> 5         N5    5         N
>>> 6         N6    6         N
>>> 7         N7    7         N
>>> 8         N8    8         N
>>> 9         T1    1         T
>>> 10        T2    2         T
>>> 11        T3    3         T
>>> 12        T4    4         T
>>> 13        T5    5         T
>>> 14        T6    6         T
>>> 15        T7    7         T
>>> 16        T8    8         T
>>> fit_pair<-lmFit(eset,design)
>>>             fit_pair<-eBayes(fit_pair)
>>>
>>> R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top
>>> 30

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list