[R] Loading matrices and other things

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jun 1 03:59:26 CEST 2005


On 5/31/05, Mike Schuler <schulerm at bc.edu> wrote:
> Hi all,
> 
> I'm new to R, so needless to say I have a couple questions (which I hope
> I haven't missed through the documentation).
> I have several files in lower triangular matrix form. For each of these
> matrices, I want to perform some form of hierarchical clustering on each
> matrix and capture the output of the clustering.
> 
> The first problem I run into is actually loading the matrix file into R.
> I've attempted using the read.table function but to no avail. What is
> the best way to read in a matrix?
>    Note: matrices are in a form like so, a space between each value,
> then a newline There is also a diagonal of 0's stripped out. (Matrices
> are the output of RNAdistance if that's helpful)
>    Let's say its stored in a file called 'rtest'
>                21
>                34 55
>                55 34 21
>                27 10 61 44
>                59 42 25 8 40
>                61 44 27 10 34 6
>                73 64 57 48 66 44 50
>                78 69 62 53 71 49 55 5
>                77 68 103 94 70 94 96 88 89
>                77 68 103 94 70 90 96 84 85 10
>                31 24 53 46 30 50 52 72 73 74 74
> 
> Second, I've searched through the web and it seems hclust
> <http://www.maths.lth.se/help/R/.R/library/mva/html/hclust.html> is the
> appropriate function From what I can tell from here
> <http://stat.ethz.ch/R-manual/R-devel/library/stats/html/dist.html> the
> above matrix should be a valid format (even without the 0s), but
> confirmation would be nice. And with hclust, does this produce a tree
> with the output, or would that be the plclust function? I haven't been
> able to experiment with this because of my inability to do accomplish
> the previous question.

Here is something to try:

# get number of entries and read in
n <- max(count.fields("myfile.dat")) + 1
x <- scan("myfile.dat")

# create matrix from x
x.mat <- matrix(0,n,n)
x.mat[upper.tri(x.mat)] <- x
x.mat <- x.mat + t(x.mat)

# convert to distance matrix
x.dist <- as.dist(x.mat)

# run hclust
x.hclust <- hclust(x.dist)

# plot
plot(x.hclust, cex = 0.6)
rect.hclust(x.hclust,k=5,border="red")

> And last, I want to be able to run R on many different files of the same
> matrix type. Is it possible to write a (Python) script run through the
> appropriate tasks and save the visual output as a postscript file?

You don't need another language.  It can all be done from R.  Suppose
we want to read in each .dat file in the current directory, plot it and
save the plot:

for (f in dir(patt = "[.]dat$")) {  x <- read.table(f); plot(x);
savePlot(f, "ps") }

savePlot, used above, is specific to Windows. See ?dev.print
if you are not on Windows.




More information about the R-help mailing list