[BioC] Ranking genes according to tissue specificity

Mon Apr 9 17:21:48 CEST 2007

Hello all,

I'm new to R/BioC, but I've been trying to use them for the following
analysis. I appologize if this email is a bit long, but I bet someone in
this list could point me in the right direction.

I have the GNF dataset with Affy expression data from 61 mouse tissues
(each with 1 biological replicate, 122 total CEL files)

In the end I would like to obtain, for each tissue, the gene list sorted
according to the specificity of their expression in that tissue. That is,
genes whose expression is highest in that tissue, relative to the other
tissues (although their absolute expression levels could be low) at the
top, and genes whose expession is lowest in that tissue (although their
absolute expression levels could be high) at the bottom. Ideally, I would
like to have some confidence value (p-value?) associated to each gene as
well.

Initially, I downloaded the pre-normalized (with MAS or gcRMA) files, and
did all the manipulation with perl scripts. For each probe X, I took its
expression values Xi (i = 1..61) for each tissue, and substituted the
expression value for (Xi - mean(x))/ std_dev(x), essentially a Z-score. In
this way, the "Z-score" represents how specifically expressed a particular
gene is in a particular tissue, considering the std_dev of the expression
levels of that gene.

One of the first problems with this, is that I am only processing a subset
of the probes, since I only use those with a RefSeq transcript. So I
thought it would be better to re-normalize everything considering only the
subset of the transcripts that I will be analyzing. Is this correct?

I think for my particular case I'm better off with a RMA/gcRMA summary. I
can see that I can use the "subset" parameter to select only the probesets
I want. Also, I can't make much use of the A/M/P calls of MAS analysis,
since I don't want the low-expression values to be cut off. I read a
couple of papers where they compared these and other methods, and decided
to initially try gcRMA.

I guess my main questions are, other than trying to get general
suggestions:

1) at what point do I use the biological replicates?

2) is there a package that I can use to obtain "relative" expression
levels, among all the tissues? I can find many examples of how to get
relative expression levels when comparing two cases (or a few more, but
always comparing in pairs). How can I best compare each tissue to "all the
rest"?

Thanks for your time,

Cei