[R] regression methods for circular(?) data.

Mon Sep 26 22:02:37 CEST 2005

On 26-Sep-05 Witold Eryk Wolski wrote:
> Ted,
> 
> I agree with you that if you unwrap the data you can use lm.
> And you can separate the data in the way you describe. However, if you 
> have thousands of such datasets I do not want to do it by "looking at 
> the graph".
> 
> Yes the scatter may be larger as in the example and range(y) may be 
> larger than 2.
> 
> And as you said in order to unwrap the data you have to separate them 
> first. It would be easy to do it using for example single linkage 
> clustering if they were no overlaps (but they do sometimes). So I were 
> just wondering if there are no more fancy methods to do this.

OK, the problems are now clearer! So we cannot rely on separation
(though there would be ways to detect this automatically if it could
be relied on).

This is where real experts on unwrapping circular data should step in,
but my immediate suggestion would be that developing something out
of the following should be useful.

First, generate the data so that we have something to work with:

  x <- runif(300,min=1,max=230)
  y <- x*0.005 + 0.2
  y <- y+rnorm(100,mean=0,sd=0.1)
  y0 <- y%%1 #  <------- modulo operation

(I've called the wrapped data "y0").

Now, assume

A. That we know the modulus is 1.0

B. That we are looking for a model y0 = (a*x + b)%%1.0

C. The we do have some idea about a range of values for a and b,
   say 0 < a < 0.01 and 0 < b < 1.0

Now try the following and inspect what you get:

  M<-numeric(101)

  for(i in (0:100)){v<-(i*0.01/100);
    M[i+1]<-max(Mod(fft((y0-v*x-0.0)%%1)*2*pi))
  }
  plot(0.01*(0:100)/100,M,ylim=c(0,1000))

  for(j in 0.5*(0:10)/10){
    for(i in (0:100)){
      v<-(i*0.01/100);
      M[i+1]<-max(Mod(fft((y0-v*x-j)%%1)*2*pi))
    }
    points(0.01*(0:100)/100,M)
  }

This gives an indication that a good value for 'a' ('i' in the
plots) is about 0.5 (or slightly larger) for some value of 'b'
('j' in the plots), from which, conditioning on this, a value
for b could be obtained similarly. The plots from the above do
not distinguish between the curves for different values of 'b';
a method of indicating this would be useful.

Just a suggestion. There may be, in some R package, a function
which implements this approach in a better way.

Over to the gurus at this point!

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 26-Sep-05                                       Time: 20:54:50
------------------------------ XFMail ------------------------------