[BioC] computing SD using a list of gene expressions

Thu Apr 21 17:31:31 CEST 2005

Dear Lana,

You could consider to try and evaluate a recently released BioC 
package, called 'plgem' (Power Law Global Error Model), that is 
available from the developmental repository under 
<http://www.bioconductor.org/repository/devel/package/html/plgem.html> 
. Briefly, it represent a method that estimates SD of single genes, 
based on a global behavior of all genes in a dataset of replicated 
samples. It takes advantage of the fact that the SD of a given gene 
depends on the average expression level of the gene itself, following a 
power law.

After installing the package (which depends on MASS) there is a quite 
straightforward wrapper that fits the model to the data, computes 
model-based differential expression statistics and outputs a list of 
significantly changing genes, based on a set of random resamplings of 
the data used for fitting the model. In case you do not have enough 
replicates in the dataset to perform the resampling step, the first n 
(default is 100) genes are selected.

You first need to create from your data an object of class ‘exprSet’ 
with a ‘phenodata’ slot that contains a covariate called 
‘conditionName’, in which you provide some coding of your classes (e.g. 
‘treated’, ‘ctrl’, etc.). The only important thing here is that you 
give the same value to samples you wish to be treated as replicates. 
Other covariates in addition to ‘conditionName’ are allowed, but will 
be ignored.

Then simply type:
 >run.plgem(esdata)->list.of.significant.genes
where ‘esdata’ is an object of class ‘exprSet’ as described above. This 
will assume by default that your baseline samples are the first 
encountered in your ‘phenodata’ and that you want to perform the 
selection at an overall significance level of 0.001. To change some of 
these or other defaults, please refer to the help pages and to the 
vignette provided in the package.

Of course you will need at least one condition with 3 replicates in 
order to fit the model, but in the remaining experimental conditions 
the SD can be estimated even from single samples.

Reference article: <http://www.biomedcentral.com/1471-2105/5/203>
I will be happy to help you if encounter any difficulties.

Good luck!
Norman

Norman Pavelka
Department of Biotechnology and Bioscience
University of Milano-Bicocca
Piazza della Scienza, 2
20126 Milan, Italy

Phone: +39 02 6448 3556
Fax: +39 02 6448 3552

> Date: Wed, 20 Apr 2005 09:34:50 -0700
> From: "Lana Schaffer" <schaffer at scripps.edu>
> Subject: [BioC] computing SD using a list of gene expressions
> To: <bioconductor at stat.math.ethz.ch>
> Message-ID: <002a01c545c6$e00089e0$54508389 at menton>
> Content-Type: text/plain
>
> Hi,
> I would like to know if there is a way to estimate standard deviations 
> for all genes, using
> noise information of genes with similar intensity levels?
> This would be helpful when trying to obtain significant fold change 
> from experiments without
> replicates.
> Thanks for your ideas.
> Lana
>

	[[alternative text/enriched version deleted]]