[BioC] problem with data processing in R

Fri Dec 11 22:53:49 CET 2009

The example I send before works but there was a typo in the heatmap.2 command
where mysort needs to replaced by mydata. Like this:

heatmap.2(mydata, dendrogram="none", Rowv=F, Colv=F, trace="none", key=T, cellnote=mysamples) 

In heatmap.2 you have the option to include a color bar on the left side that
can be used to highlight clusters. See the help documentation for more details.

The dimensions of heatmap.2 plots can be controlled like for any other plot in 
R, using the hight and width arguments, e.g. x11(height=6, width=2) or pdf(...).

To provide more specific help, you may want to send a simple sample data set that 
illustrates what you are trying to do exactly. Without this it is really hard 
to understand your problem. 

If I were you then I would perform the entire clustering procedure in R. Is there any
good reason not use R for this? For hierarchical clustering you can use the hclust function.
A relatively complete list of clustering algorithms available in R can be found on 
the cluster task view page: http://cran.at.r-project.org/web/views/Cluster.html

Thomas

On Fri, Dec 11, 2009 at 10:17:22PM +0100, Maxim wrote:
> Hi,
> 
> what you suggested sounds interesting but actually I do not understand where
> it is going to. Actually I'll get an error doing it like this as mysort in
> heatmap.2 is not defined (yet).
> 
> What did work for me in meantime, is simply to plot the complete heatmap at
> once. What is missing in this approach is a label on the left or right side
> of the heatmap, I'd love to have a colorcoded block that allows me to see,
> where different IDs were plotted (different IDs are actually clusters coming
> from hierarchical clustering not performed in R).
> 
> Second I can do it by generating individual heatmaps for each ID loaded from
> individual files, unfortunately for some IDs there are thousand rows of
> data, for others only 50. But R's heatmap always produces similarly sized
> maps. I'd prefer to have the height of the individual heatmaps according to
> the corresponding number of rows rather than automatic scaling.
> 
> Is there a way to do this in R? I found an old mail in the mailing list
> discussing this point, the result was to use TreeView/Cluster, but I cannot
> get this to work without doing clustering (the data is clustered already),
> additionally I do not know how to do batch processing in TreeView.
> 
> Maxim
> 2009/12/10 Thomas Girke <thomas.girke at ucr.edu>
> 
> > I am not sure if I understand every part of your problem correctly,
> > but here is an example how something like this could be done in R.
> > Its main idea is to keep the entire data set in one matrix and use
> > the cell note feature of heatmap.2 for sample tracking.
> >
> > ## Sample matrix for demo purpose. If your
> > y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""),
> > paste("t", 1:5, sep="")))
> >
> > ## Sort each row by its values
> > mydata <- t(apply(y, 1, sort))
> >
> > ## Obtain sample labels (column titles) for sorted rows
> > mysamples <-  t(apply(y, 1, function(x) names(sort(x))))
> >
> > ## Plot heatmap where the sample labels are given as cell notes for
> > tracking purposes
> > library(gplots)
> > heatmap.2(mydata, dendrogram="none", Rowv=F, Colv=F, col=redgreen(75),
> > scale="row", trace="none", key=T, cellnote=mysamples)
> >
> > Thomas
> >
> >
> > On Thu, Dec 10, 2009 at 03:13:31PM +0100, Maxim wrote:
> > > Hi,
> > >
> > >
> > > I'm stuck with parsing data into R for heatmap representation.
> > >
> > >
> > > The data looks like:
> > >
> > > 1 id1 x1 x2 x3 .... x20
> > >
> > > 2 id1 x1 x2 x3 .... x20
> > >
> > > 3 id1 x1 x2 x3 .... x20
> > >
> > > 4 id1 x1 x2 x3 .... x20
> > >
> > > .........
> > >
> > > 348 id2 x1 x2 x3 .... x20
> > >
> > > 349 id2 x1 x2 x3 .... x20
> > >
> > > 350 id2 x1 x2 x3 .... x20
> > >
> > > 351 id2 x1 x2 x3 .... x20
> > >
> > > .........
> > >
> > >
> > >
> > > The data is sorted for the IDs (id1,id2 .....id40) and I like to produce
> > 40
> > > heatmaps thereof, 1 heatmap per data corresponding to a single ID.  The
> > data
> > > that has to be plotted is 20 values (x1 to x20). There is different
> > amounts
> > > of data for respective IDs. In the end I'd like to have the 40 heatmaps
> > > stacked on top of each other sorted by ID and heatmap heights according
> > to
> > > the amount (number of rows) of data. Unfortunately the individual data
> > lines
> > > have to be sorted with respect to the maximum of the values X1 to x20 in
> > > individual rows. Actually this not that important as I guess this might
> > be
> > > easier to realize in upstream Perl scripts producing the data.
> > >
> > >
> > > The data is available as data per ID in individual files or as a sorted
> > file
> > > with the complete dataset (as shown above).
> > >
> > >
> > > Is it possible in R to break a file as above into distinct blocks
> > (depending
> > > on ID) and then to process it (sorting according to maximum, heatmap)?
> > >
> > >
> > > Which commands do I have to issue for the manipulation of the data.frame?
> > I
> > > tried the
> > >
> > >
> > > I'd be glad  if someone could help me finding the correct direction to
> > solve
> > > my problem!
> > >
> > >
> > > Best regards
> > >
> > >
> > > Maxim
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
i> > > Bioconductor at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> >