[R] Summing up histograms and misc.

Kjetil Kjernsmo kjetil.kjernsmo at astro.uio.no
Tue Mar 21 23:26:34 CET 2000


Dear all!

I have a few questions. I'm running Version 1.0.0 on Digital UNIX.

My problem is as follows: I have hundreds of megabytes in images stored in
a few hundred files, and I want to extract some basic characteristics
about the images in these files. For a start, I need to find the density
function of all the files combined. Since it's several hundred MBs, I
haven't enough RAM to just read the files and get a histogram (I've tried 
with some data, though, with 2GB heap memory).
I do know dimensions of the images however, and fortunately, I also know
the histogram bin size, but I don't know what maximum I can expect from
the files. My strategy is now to read one file, make a histogram  
equidistant breaks, store it in an object, read another one, another
histogram, and attempt to add these two, and so on. I failed to find a
simple way of doing this (does it exist?), so I hacked this function up
myself:

addhists <- function(h1, h2)
{
  if(length(h1$breaks) >= length(h2$breaks))
  {
    bl <- h1$breaks
    bs <- h2$breaks
    cl <- h1$counts
    cs <- h2$counts
    mi <- h1$mids
  } else {
    bs <- h1$breaks
    bl <- h2$breaks
    cs <- h1$counts
    cl <- h2$counts
    mi <- h2$mids
  }
  ind <- sub.vector(bs, bl)
  if(! is.vector(ind)) stop("Incompatible breaks")
  c0 <- rep(0, length(cl))
  c0[ind[1:length(cs)]] <- cs
  ct <- c0 + cl
  int <- ct/(sum(ct)*diff(bl)) # This works for non-equidistant breaks?
  return(list(breaks=bl,
              counts=ct,
              intensities=int,
              mids=mi))
}

Now, the problem is that I don't know what maximum values I can expect
(nor minimum, ideally) for other files than the one I have open, so the
breaks vector of one of the histograms may be shorter than the other. The
first few lines is just there to get short and long vectors into objects
of their own. Then, I thought, I must check whether a sequence of breaks
in the shorter vector occurs in the longer vector, and if so find
the indices of that sequence.
Here, I think I must have overlooked something... I didn't find an elegant
way of finding the indices of a subset _sequence_ of an array... My own
solution, the sub.vector-function looks like this:
sub.vector <- function(b1, b2)
{
  i <- 0
  while(! all(b2[(i+1):(length(b1)+i)] == b1))
  {   
    i<-i+1
    if(i+length(b1) > length(b2)) return(NULL)
  }
  return((i+1):(length(b1)+i))
}
Ugly, eh...? :-) Any suggestions?

A more general solution could of course attempt to recompute the breaks,
but since I know where the breaks should be, this function seems to work
for my purpose. It's just a bit slow, and not very elegant. Therefore, if
anybody has suggestions for improvement, it would be greatly appreciated. 

One thing I came to think of: Is it possible to use the object-orientation
of R to define my function as the operator '+', so that for histogram
objects, I could simply add the two objects as with any other objects?
 
Finally, when I have a histogram object, how can I most easily plot it as
a proper histogram? 

There are a couple of other issues I have also been struggling with
lately: I try to set par(), it is in particular setting type='l' I'm
interested in. Starting from scratch, the default says:
> par()$type
[1] "p"
and indeed the plot is a point-type plot. I go
> par(type='l')
> par()$type
[1] "l"
But doing e.g. 
> plot(x,1/x)
imidiately afterwards still produces a point-type plot. I have to say
> plot(x,1/x, type='l')
to get a line-type plot. What am I doing wrong?

Finally, I get a warning when trying to compile a c-routine (that was
provided to me by Brain Ripley to read the abovementioned images
off-list), it doesn't seem to mean anything, but I figured I might as well
ask since I (finally) have upgraded to R-1.0.0. It goes:
rukbat:~/hfag/src/c> R CMD SHLIB readraw.c
cc -I/local/R/lib/R/include   -ieee_with_inexact -std1   -O3 -tune ev56 -c
readraw.c -o readraw.o
f77 -shared  -o readraw.so readraw.o  -ldxml  -lUfor -lfor -lFutil -lm
-lots -lmld:
Warning: Unresolved:
Rf_error

Is it anything to worry about?

Best,

Kjetil
-- 
Kjetil Kjernsmo
Graduate astronomy-student                    Problems worthy of attack
University of Oslo, Norway            Prove their worth by hitting back
E-mail: kjetikj at astro.uio.no                                - Piet Hein
Homepage <URL:http://www.astro.uio.no/~kjetikj/>
Webmaster at skepsis.no 



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list