[BioC] normalisation + sam

Wolfgang Huber whuber at embl.de
Thu Apr 22 21:41:56 CEST 2010


Hi Mike

can you provide a reproducible example (R script) and the output of 
sessionInfo() for the 2-groups comparison? This would include a pointer 
to the 6(?) CEL files on the web.

For the 28-classes case, I doubt that hypothesis testing of the null 
hypothesis "mean in all classes is the same" is the most useful data 
analytic approach. Perhaps you can precisize your question.

	Best wishes
	Wolfgang

> Hi,
> 
> I'm trying to use siggenes - sam() to look at differentially expressed genes of data taken from the Immunological Genome project (http://www.immgen.org/). A problem with this is that I have to perform the preprocessing of the original CEL files as they do not make the processed data available on GEO. 
> 
> To do this I'm using the aroma suite of packages to perform quantile normalisation of this data set (so far 128 arrays) in fixed memory (i.e. my laptop). This is a good thing, as it has forced me to learn a little about array preprocessing, and a bad thing as I'm new to all this and might be going horribly wrong. 
> 
> When it comes time to look for differentially expressed genes, I'm using siggenes - sam() and I'm getting some strange results. I'm using (what I think would be considered) many classes (28), where each class has at least 3 examples, and thus throwing out some of the arrays. The results I'm getting look like:
> 
> SAM Analysis for the Multi-Class Case with 28 Classes
> 
>    Delta    p0    False Called      FDR
> 1    0.1 0.014 23355.05  23532 0.013997
> 2    0.2 0.014 21805.98  23498 0.013087
> 3    0.3 0.014 15923.83  23169 0.009693
> 4    0.4 0.014  9637.47  22343 0.006083
> 5    0.5 0.014  5527.75  21330 0.003655
> 6    0.6 0.014  3060.73  20221 0.002135
> 7    0.7 0.014  1703.82  19205 0.001251
> 8    0.8 0.014   953.02  18256 0.000736
> 9    0.9 0.014   536.81  17382 0.000436
> 10   1.0 0.014   307.19  16730 0.000259
> 11   1.1 0.014      176  16143 0.000154
> 
> which I think implies that many many genes are differentially expressed. Using plot(sam.out, 1.2) shows a pretty much vertical line starting at the origin, with no genes observably behaving like the null model. Even if I only try this on 2 classes, and hence throwing out most of the data, I'm still not getting sensible results.
> 
> Now I'm hoping that I'm doing something wrong, and that not 16K of my genes are differentially expressed. However, I'm having difficulty figuring out what it might be. The one striking thing between my data set and the golub example set is that golub seems to be zero-mean - is this a requirement for sam()?
> 
> Any other ideas of what to look for, or what other information I could provide to help this question make sense, would be greatly appreciated.
> 
> Thanks in advance,
> 
> Mike Dewar
> 
> - - -
> Dr Michael Dewar
> Postdoctoral Research Scientist 
> Applied Mathematics
> Columbia University
> http://www.columbia.edu/~md2954/
> 
> 
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list