[BioC] result of linear model

Gordon K Smyth smyth at wehi.EDU.AU
Wed Sep 9 01:42:32 CEST 2009


Dear Md.Mamunur Rashid,

I, and others on this list, try to help with problems or issues with the 
code, but analysing your own data is your responsibility.

I make the following points:

1. All your fold changes are very small, so you'd hardly expect the 
changes to be highly significant.  Have you thought at all about why they 
are small?  Is there any biological reason to think there should be 
differences between the groups?

2. All your DE genes seem to be expressed at low levels.  Again, have you 
throught about this?

3. You can't take a turn-key approach to microarray analysis.  You need to 
look at your data, assess quality, and so on.  Have you done the usual 
plots (boxplots, MA-plots)?  Have you tried a MDS plot or cluster plot of 
your samples to see how consistent they are within groups?  Have you 
checked whether some samples are outliers or of poor quality?  Have you 
filtered non-expressed probes?  Etc, etc.  (I'm not asking you to show me 
these things, but you should go over them yourself.)

Finally, the BH adjusted p-values are not p-values, but are FDR rates. 
You are not required to use a 5% cutoff for these.

Hope this helps.

Best wishes
Gordon

On Tue, 8 Sep 2009, Md.Mamunur Rashid wrote:

> Dear Gordon,
>
> I am working with a set of illumina microarray data (96 samples) from 
> three groups (i.e. group-1(X),group-2(Y),group-3(Z)). 32 samples from 
> each group.
>
> I have read the data using lumiR method and processed the data using lumi 
> Methods(lumiExpresso).
>
> Now I need to identify the differentially expressed genes by comparing 
> each of these groups with each other. I am using linear model in limma 
> package and topTable method to identify top N differentially expressed 
> genes. Below is the code that I have used to design the test in linear 
> model. If you look at the result of the top-table all the adjusted 
> p-values are very high. as a result none of the genes passes through the 
> cut-off p-value 0.05. I have tried all the tree combination and in all 
> cases adjusted p-values are.
>
> Now I have tested the same code on another set of data that has 20 
> samples that has been analyzed before (i.e I have the results before 
> hand). In that case also adjusted p-values are very high but genes in 
> the top-table are correct( i.e matches with the result of the previous 
> analysis).
>
>
> so in this situation, I will be really grateful if you can give me some 
> suggestion on
>
> 1. Is there anything wrong in my code which is making the adjusted p-value so 
> high.???
> 2. Or it might be a problem in the data pre-process phase.???
>
> *** I have attached the code , the result and the array-weight in a text 
> file.
>    Please have a look.
>
> thanks in advance. If anybody else want to have a suggestion, you are most 
> welcome.
>
> regards,
> Md.Mamunur Rashid
>
>
>
> ########   code #############
>
>
> ## norm_object is the normalized object
>
> d_Matrix<- exprs(norm_object)
> probeList<- rownames(d_Matrix)
> library(illuminaHumanv3BeadID.db)
>
> ## filtering out the un-annotated genes
>
> x<- illuminaHumanv3BeadIDSYMBOL
> annotated_ids<- mappedkeys(x)
> d_Matrix<- exprs(norm_object)
> probeList<- rownames(d_Matrix)
> idx<- probeList %in% annotated_ids
> d_matrix<-d_matrix[idx,]
>
>
> ## 32 samples from each group without any pair
> sampleType<- c("X","X","Y","Y",..........96 
> samples..........,"Z","X","Y","Y","X","X","Z","I","X")
> design<-model.matrix(~0+sampleType)
> colnames(design_norm_test)<- c('X','Y','Z')
>
> fit1<- lmFit(d_Matrix,design)
> constrast.matrix<- makeContrasts (Y-X , Z-Y , Z-X, levels=design)
> fit1_2<- contrasts.fit(fit1,contrast.matrix)
> fit1_2<- eBayes(fit1_2)
> topTable(fit1_2,coef=1, adjust="BH")
>
>
>
> Result of the toptable method:
>
> 6284  ILMN_1111111  0.11999169 6.341387  4.828711 5.237786e-06 0.2325975
> 12919 ILMN_2222222 -0.05966259 6.187268 -4.678886 9.532099e-06 0.2325975
> 6928  ILMN_3333333 -0.31283278 6.881315 -4.561366 1.513503e-05 0.2462115
> 42428 ILMN_4444444 -0.13036276 6.815443 -4.288051 4.321272e-05 0.3964163
> 36153 ILMN_5555555  0.25070344 6.487644  4.190735 6.220719e-05 0.3964163
> 36152 ILMN_6666666  0.21502145 6.470917  4.158153 7.019901e-05 0.3964163
> 28506 ILMN_7777777  0.13918530 6.616036  4.158140 7.020219e-05 0.3964163
> 11763 ILMN_8888888 -0.17331384 7.322021 -4.154668 7.110990e-05 0.3964163
> 38906 ILMN_9999999  0.05532714 6.224477  4.093623 8.903425e-05 0.3964163
> 4728  ILMN_0000000  0.05371882 6.177268  4.081921 9.293339e-05 0.3964163
>
> 	*B*
>
> 12919 3.236579
> 6928  2.801832
> 42428 1.818263
> 36153 1.477781
> 36152 1.364969
> 28506 1.364927
> 11763 1.352940
> 38906 1.143329
> 4728  1.103392



More information about the Bioconductor mailing list