[R] Three questions about a model for possibly periodic data with varying amplitude

Mon Jul 31 14:36:39 CEST 2006

Hi dear R community,

I have up to 12 measures of a protein for each of 6 patients, taken
every two or three days.  The pattern of the protein looks periodic,
but the height of the peaks is highly variable.  It's something like
this:

patient <- data.frame(
	day = c(1, 3, 5, 8, 10, 12, 15, 17, 19, 22, 24, 26),
	protein =  c(5, 3, 10, 7, 2, 8, 25, 12, 7, 20, 10, 5)
	)
plot(patient$day, patient$protein, type="b") 

My goal is two-fold: firstly, I need to test for periodicity, and
secondly, I need to try to predict the temporal location of future
peaks.  Of course, the peaks might be occurring on unmeasured days.

I have been looking at this model:

wave.form <- 
  deriv3( ~ sin(2*pi*((day-offset)/period + 0.25)) * amplitude + mean,
         c("period", "offset", "amplitude", "mean"),
         function(day, period, offset, amplitude, mean){})

curve(wave.form(x, period=7, offset=2, mean=5, amplitude=4), 
		   from=1, to=30)

So, for my purposes, the mean and the amplitude seem to be irrelevant;
I want to estimate the location and the offset.  To get going I've
been using the following approach:

require(nlme)

wave.1 <- gnls(protein ~ wave.form(day, period, offset, amplitude, mean),
               start=list(period=7, offset=0, amplitude=10, mean=0),
               weights=varPower(), data=patient)

 ... which seems to fit the wave form pretty nicely.  

Question 1) I'm wondering, however, if anyone can suggest any
            improvements to my model or fitting procedure, or general
            approach.

Generalizing to a non-linear mixed effects model is proving difficult.
For example,

#################################################################

set.seed(1234)

patients <- expand.grid(patient.no = 1:6,
			day = patient$day)

patient.params <- data.frame(patient.no = 1:6,
			     period = c(5,6,7,8,7,6),
			     offset = c(1,2,3,1,2,3),
			     amplitude = c(10,6,10,6,10,6),
			     mean = c(22,14,22,14,22,14))

patients <- merge(patients, patient.params)

patients$protein <- with(patients, 
		 wave.form(day, period, offset, amplitude, mean)) +
		 rnorm(n=dim(patients)[1], mean=5, sd=2)

patients <- groupedData(protein ~ day | patient.no, data=patients)

protein.nlme <- nlme(protein ~ 
	     wave.form(day, period, offset, amplitude, mean),
                     fixed = period + offset + mean + amplitude ~ 1,
                     random = period + offset ~ 1 | patient.no,
                     start = c(period=2*pi, offset=0, mean=10,
			   amplitude=5),
                     data = patients)

random.effects(protein.nlme) 

#################################################################

I get the following values for the BLUPS:

  period        offset
2      0 -5.738510e-09
4      0 -6.426104e-08
6      0  6.847430e-09
1      0  6.275570e-07
5      0 -1.486590e-06
3      0  9.221848e-07

It seems clear to me that these BLUPS are quite unsuitable.  

Question 2) Can anyone suggest how I might improve my use of nlme?
            Other than using more data :) 

Question 3) Finally, I'm also wondering what would be a suitable test for
            periodicity for these data. I'd like to test the null
            hypothesis that the data are not periodic.

All suggestions, discussion, welcome!

Best wishes

Andrew
-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
Email: a.robinson at ms.unimelb.edu.au         http://www.ms.unimelb.edu.au