[BioC] bicluster

Ramon Diaz-Uriarte rdiaz at cnio.es
Fri Sep 22 11:12:13 CEST 2006


On Thursday 21 September 2006 20:39, Weiwei Shi wrote:
> hi again:
>
> I modified some codes from
> http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/R_BioCondManual.html#R_c
>lustering
>
> to do a simple biclustering. On both dimensions, I used pearson
> distance. When x is a square matrix, it works; while x is not a square
> matrix, there is an error. So I am wondering if there is a way to do
> the job for the latter since on microarray analysis, the matrix is not
> a square for most of time. Please correct me if
> 1. i should not use the same pearson distance on both direction
> 2. any other problems causing the error.

Dear Weiwei,

I am not sure I can answer your question, but my concerns are more "basic": I 
am not sure that the approach you are taking really takes care of the 
biclustering business. Briefly:

I am not convinced that "biclustering" is something well defined. I mean, when 
you say you are doing biclustering, what is it that you want to accomplish 
exactly? (Contrast this with saying something well defined such as "minimize 
the residual sum of squares of this"). 

That there are many interpretations of what biclustering is/should be/should 
do is evidenced by the variety of approaches. For instance, the Plaid model 
of Lazzeroni and Owen is very different from the COSA approach of Friedman 
and Meulman (I say "COSA approach", because I am not sure it is correct to 
call COSA a model, whereas Plaid, I think, does qualify as model). These are 
the two approaches for biclustering I found more attractive (or 
understandable), but there are a whole bunch of others, each with slightly 
different interpretations of what you ought to want to do when you do 
biclustering. (There are a few cases where biclyustering method B is 
developed as an improvement of biclustering method A, where both A and B have 
similar "objectives", but that is more the exception than the rule in the 
biclustering literature).


I haven't followed the biclustering literature recently, but I think there are 
a couple of reviews somewhere in the last 2 (?) years. I'd suggest starting 
there, and trying to find a model/method/algorithm/recipe/whatever that you 
think is reasonable/understandable/esthetically pleasing/whatever, and going 
from there. If at all possible, I'd strongly advice against reinventing the 
wheel, specially in this case: I find it unlikely that a shortcut is easy to 
find.

Sorry for the unhelpful comments.

Best,

R.

P.S. Disclaimer: I tend to find most clustering adventures as ill-defined; 
thus, I find most biclustering adventures as utterly purposeless, ill-defined 
diversions.



>
> > x <- matrix(rnorm(1000,10,2),100,100); hc <-
> > as.dendrogram(hclust(as.dist(cor(t(x), method="pearson")))); hr <-
> > as.dendrogram(hclust(as.dist(cor(x, method="pearson")))); hv <-
> > heatmap(x, Rowv=hr, Colv=hc)
> >
> > x <- matrix(rnorm(1000,10,2),100,200); hc <-
> > as.dendrogram(hclust(as.dist(cor(t(x), method="pearson")))); hr <-
> > as.dendrogram(hclust(as.dist(cor(x, method="pearson")))); hv <-
> > heatmap(x, Rowv=hr, Colv=hc)
>
> Error in heatmap(x, Rowv = hr, Colv = hc) :
>         row dendrogram ordering gave index of wrong length
>
> On 9/21/06, Ramon Diaz-Uriarte <rdiaz at cnio.es> wrote:
> > On Wednesday 20 September 2006 23:29, Weiwei Shi wrote:
> > > Dear listers:
> > >
> > > I searched the literature and failed to find some public tools used
> > > for biclustering analysis in microarray application. I am wondering if
> > > there is one in bioconductor or somewhere else?
> >
> > Dear Weiwei,
> >
> > I don't think there are BioC tools for this. But there is R/S code for at
> > least:
> >
> > - the Plaid model, in a set of functions from Heather Turner (don't have
> > the URL here, but you'll find it googling); this is a reimplementation of
> > the Lazzeroni & Owen approach, with some differences.
> >
> > - the COSA approach of Friedman & Meulman (google for "COSA Friedman")
> >
> >
> > Best,
> >
> > R.
> >
> > > thanks
> >
> > --
> > Ramón Díaz-Uriarte
> > Bioinformatics
> > Centro Nacional de Investigaciones Oncológicas (CNIO)
> > (Spanish National Cancer Center)
> > Melchor Fernández Almagro, 3
> > 28029 Madrid (Spain)
> > Fax: +-34-91-224-6972
> > Phone: +-34-91-224-6900
> >
> > http://ligarto.org/rdiaz
> > PGP KeyID: 0xE89B3462
> > (http://ligarto.org/rdiaz/0xE89B3462.asc)
> >
> >
> >
> > **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los
> > ficheros adjuntos, pueden contener información protegida para el uso
> > exclusivo de su destinatario. Se prohíbe la distribución, reproducción o
> > cualquier otro tipo de transmisión por parte de otra persona que no sea
> > el destinatario. Si usted recibe por error este correo, se ruega
> > comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY
> > NOTICE** This email communication and any attachments may contain
> > confidential and privileged information for the sole use of the
> > designated recipient named above. Distribution, reproduction or any other
> > use of this transmission by any party other than the intended recipient
> > is prohibited. If you are not the intended recipient please contact the
> > sender and delete all copies.

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}



More information about the Bioconductor mailing list