[BioC] limma for identifying differentially expressed genes from illumina data

J.Oosting at lumc.nl J.Oosting at lumc.nl
Fri Aug 28 09:49:23 CEST 2009


With Illumina arrays I usually filter out all the un-annotated probes
before doing the statistical analysis. Especially in the 48k versions of
these arrays about half the probes are not annotated, and for most
purposes these are not interesting.

Furthermore you should have a look at the quality of the arrays. 
-The average expression seems a bit low, but this could also be caused
by the normalization method. 
-Check if the order of samples in the lumi object is the same as you
define in the script. It is usually better to incorporate information
like this in the phenoData slot of your object, and pull it out when
defining groups for the analysis.


Jan

In your case filtering the probes would look something like this:

library(illuminaHumanv3BeadID.db)
# get annotations
x <- illuminaHumanv3BeadIDSYMBOL
annotated_ids <- mappedkeys(x)
d_Matrix <- exprs(norm_object)
probeList <- rownames(d_Matrix)
idx<- probeList %in% annotated_ids
d_matrix<-d_matrix[idx,]


## 32 samples from each group without any pair
design <-model.matrix(~0+pData(norm_object)$sampleType)
colnames(design) <- c('X','Y','Z')
fit1 <- lmFit(d_Matrix,design)
constrast.matrix <- makeContrasts (Y-X , Z-Y , Z-X, levels=design)
fit1_2 <- contrasts.fit(fit1,contrast.matrix)
fit1_2 <- eBayes(fit1_2)
topTable(fit1_2,coef=1, adjust="BH") 


> Dear Md.Mamunur Rashid:
> 
>     Perhaps you can try filter out probes which do not express in all
> the arrays. You can use the detection p value to do this. The
detection
> p value of a probe is the proportion of negative control probes which
> have intensities larger than that probe. You can filter out those
probes
> which have detection p values larger than a cutoff (e.g. 0.01) in all
> the arrays. This might help you find some differentially expressed
genes.
> 
> Cheers,
> Wei
> 
> Md.Mamunur Rashid wrote:
> > Dear All,
> >
> > I am working with a set of illumina microarray data (96 samples)
from
> > three groups (i.e. group-1(X) group-2 (Y),  group-3(Z)). I have read
> > the data using lumiR methoda and normalized the data using lumi
> > Methods. Now I need to identify the differentially expressed genes
by
> > comparing each of these groups with each other. I am using linear
> > model fit in limma package and topTable method to identify top N
> > differentially
> > expressed genes.
> >
> >
> > 1. When I am adjusting the p value  using "BH" method in the
topTable
> > method the adj.p.value is getting too high
> >   as a result none of the genes are getting selected with threshold
> > p.value = 0.05 . 2.* *The logfold change values are very low.
> > I have tried comparing all the 3 combination and the situation is
more
> > or less similar.
> > Does this indicate that none of the genes are not differentially
> > expressed at all!!!
> > (Which might be a odd) or I am doing something wrong???!!!
> > Please I will really appreciate if any body can give any advice.
> >
> > ****************************************************
> > I have attached the code and the result below.
> > ******************************************************
> >
> > ## norm_object is the normalized object
> >
> > d_Matrix <- exprs(norm_object)
> > probeList <- rownames(d_Matrix)
> >
> > ## 32 samples from each group without any pair
> > sampleType <- c("X","X","Y","Y",..........96  samples..........
> > ,"Z","X","Y","Y","X","X","Z","I","X") design <-
> > model.matrix(~0+sampleType)
> > colnames(design_norm_test) <- c('X','Y','Z')
> > fit1 <- lmFit(d_Matrix,design)
> > constrast.matrix <- makeContrasts (Y-X , Z-Y , Z-X, levels=design)
> > fit1_2 <- contrasts.fit(fit1,contrast.matrix)
> > fit1_2 <- eBayes(fit1_2)
> > topTable(fit1_2,coef=1, adjust="BH")
> >
> >>
> >
> >                ID       logFC  AveExpr         t      P.Value
adj.P.Val
> > 6284  ILMN_1111111  0.11999169 6.341387  4.828711 5.237786e-06
> > 0.2325975  12919 ILMN_2222222 -0.05966259 6.187268 -4.678886
> > 9.532099e-06 0.2325975
> > 6928  ILMN_3333333 -0.31283278 6.881315 -4.561366 1.513503e-05
0.2462115
> > 42428 ILMN_4444444 -0.13036276 6.815443 -4.288051 4.321272e-05
0.3964163
> > 36153 ILMN_5555555  0.25070344 6.487644  4.190735 6.220719e-05
0.3964163
> > 36152 ILMN_6666666  0.21502145 6.470917  4.158153 7.019901e-05
0.3964163
> > 28506 ILMN_7777777  0.13918530 6.616036  4.158140 7.020219e-05
0.3964163
> > 11763 ILMN_8888888 -0.17331384 7.322021 -4.154668 7.110990e-05
0.3964163
> > 38906 ILMN_9999999  0.05532714 6.224477  4.093623 8.903425e-05
0.3964163
> > 4728  ILMN_0000000  0.05371882 6.177268  4.081921 9.293339e-05
0.3964163
> >                 B
> >
> > 12919 3.236579
> > 6928  2.801832
> > 42428 1.818263
> > 36153 1.477781
> > 36152 1.364969
> > 28506 1.364927
> > 11763 1.352940
> > 38906 1.143329
> > 4728  1.103392



More information about the Bioconductor mailing list