[BioC] Tissue specific genes with limma

Gordon Smyth smyth at wehi.edu.au
Fri Jan 14 02:35:38 CET 2005


>Date: Thu, 13 Jan 2005 10:49:20 +0200
>From: "Ron Ophir" <ron.ophir at weizmann.ac.il>
>Subject: [BioC] Tissue specific genes with limma
>To: <bioconductor at stat.math.ethz.ch>
>Message-ID: <s1e6524f.098 at wisemail.weizmann.ac.il>
>Content-Type: text/plain; charset=US-ASCII
>
>Dear Limma users,
>In our study we would like to identify tissue specific genes, i.e.,
>genes that are differentially
>expressed in a specific tissue. From practical reason each RNA extract
>is a mixture of tissues. These RNA sample was hybridized to Affymetrix
>chips. I thought that linear model is a good algorithm to extract the
>relative contribution of tissue to each gene expression (correct if I
>wrong up to here). Therefore I prepared a design matrix as follow:
>
>               LP        ML      ADL     ABL     T       S       B
>   LER   1       1       1       1       1       1       1
>   LER   1       1       1       1       1       1       1
>   M7    0       1       1       1       1       0       0
>   M5    1       0       1       1       1       1       1
>   M7    0       1       1       1       1       0       0
>   M5    1       0       1       1       1       1       1
>   AD    1       0       1       0       1       0       0
>   M2    1       0       1       1       1       1       1
>   Trichom       1       0       1       1       0       1       1
>   Stipuls       1       0       1       1       1       0       1
>   Stipuls       1       0       1       1       1       0       1
>   AB    1       0       0       1       0       1       0
>   AB    1       0       0       1       0       1       0
>   AD    1       0       1       0       1       0       0
>   LER   1       1       1       1       1       1       1
>   M2    1       0       1       1       1       1       1

>Where LER for example is the RNA sample that has a mixture of all
>tissues LER= LP+ML+ADL+ABL+T+S and the rest of the row are the RNA
>mixtures of any set of tissues signed by 1. We also assume no
>interaction and that the tissues are in equal amount therefore we expect
>by linear models to find the relative contribution of each tissue to the
>gene expression.
>First is the above matrix is the right matrix or should I set the
>replicates to its proportion in order not to violate the assumption that
>the tissues are present in equal amount in all mixtures, like this:
>         LP      ML      ADL     ABL     T       S       B
>   LER   0.3     0.3     0.3     0.3     0.3     0.3     0.3
>   LER   0.3     0.3     0.3     0.3     0.3     0.3     0.3
>   M7    0       0.5     0.5     0.5     0.5     0       0
>   M5    0.5     0       0.5     0.5     0.5     0.5     0.5
>   M7    0       0.5     0.5     0.5     0.5     0       0
>   M5    0.5     0       0.5     0.5     0.5     0.5     0.5
>   AD    0.5     0       0.5     0       0.5     0       0
>   M2    0.5     0       0.5     0.5     0.5     0.5     0.5
>   Trichom       1       0       1       1       0       1       1
>   Stipuls       0.5     0       0.5     0.5     0.5     0       0.5
>   Stipuls       0.5     0       0.5     0.5     0.5     0       0.5
>   AB    0.5     0       0       0.5     0       0.5     0
>   AB    0.5     0       0       0.5     0       0.5     0
>   AD    0.5     0       0.5     0       0.5     0       0
>   LER   0.3     0.3     0.3     0.3     0.3     0.3     0.3
>   M2    0.5     0       0.5     0.5     0.5     0.5     0.5

I can't see how you've obtained the entries in this matrix.

>Second, to identify tissue specific genes we would like to have the
>summation of a specific tissue for all mixtures. In details,
>as a result of linear model fit we expect to get a matrix of expression
>values for each gene, which like design matrix rows are RNA samples and
>columns are tissues. Where the observed value of LER mixture, for
>example, equal for sum of the values of the relative contribution of
>each tissue: LER= 0.5(from LP)+4(from ML)+3(from ADL)+1.2(from
>ABL)+0.3(from T)+1(from S)=10 where 10 is the observed expression value
>for a given mixture for a given gene and 0.5,4,3,1.2,0.3,1 are the
>deduced expression values from the linear fit for each tiisues. What we
>are interesting is finding the summation for each gene over the columns,
>i.e., LP = 0.5(relative LP contribution in
>LER)+0.6(M2)+1.2(M5)+0(M7)+1(Trichom)+3(AB)+2(AD) for each tissue. In
>limma if we set in the design one of the tissues as a reference (tissue
>that exist in all mixture) we will get the differential expression of
>all other tissues relative to it, however we are looking to the absolute
>expression. In other words I am looking for the absolute expression of
>each gene for each tissue rather than having the differential expression
>which is the usually the final result in limma.
>Is it possible to do that?

In principle linear modeling can do this, but you need to ensure that 
you've pre-processed the data in an appropriate way and that the model that 
you're fitting matches the data. I am not sure about this.

The design matrices, together with the fact that expression can't be 
negative in any tissue, implies that the overall expression is higher in 
some target samples that in others. For example, 'M70' is the same as 'LER' 
but without the contributions of tissues, LP, S and B. You are asserting 
that expression is lower in M70 than in LER for all genes expressed in LP, 
S or B, and that all other genes have equal expression in M70 and LER. Is 
this what you intend? If it is, then you can't use quantile normalization 
across chips as done for example by rma(). You would need specialist 
assistance. Really you should collaborate with some one about your 
experiment in more detail.

Gordon

>Ron



More information about the Bioconductor mailing list