[R] Approximate Entropy?

Tue Dec 23 19:45:47 CET 2008

Ben,

thanks a lot for that! I have a (reasonably) good idea about what ApEn 
should be, and I'll try to understand your translation of Kaplan's 
Matlab code.

Best,
Stephan

Ben Bolker schrieb:
> 
> 
> Stephan Kolassa wrote:
>> Dear guRus,
>>
>> is there a package that calculates the Approximate Entropy (ApEn) of a 
>> time series?
>>
>> RSiteSearch only gave me a similar question in 2004, which appears not 
>> to have been answered:
>> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/28830.html
>>
>> RSeek.org didn't yield any results at all.
>>
>> Happy holidays (where appropriate),
>> Stephan
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> 
> I went ahead and translated D. Kaplan's MATLAB code for this purpose into R.
> Warning: it's barely tested, and I'm not sure I understand what it's doing
> -- you're on your own from here ...
> (it won't work if you simply cut and paste what's here, since the examples
> are above the definitions
> of the utility functions etc.)
> 
> 
> ## http://www.macalester.edu/~kaplan/hrv/doc/funs/apen.html
> 
> ## Approximate Entropy
> ## Syntax
> 
> ## entropy = apen( pre, post, r );
> 
> ## Arguments
> ## pre 	An embedding of data.
> ## post 	The images of the data in the embedding.
> ## r 	The filter factor, which sets the length scale over which to compute
> the approximate entropy.
> ## Returned Values
> ## entropy 	The numerical value of the approximate entropy.
> ## Description
> 
> ## The "approximate entropy" was introduced by Pincus to quantify the
> ## creation of information in a time series. A low value of the
> ## entropy indicates that the time series is deterministic; a high
> ## value indicates randomness.
> 
> ## The "filter factor" r is an important parameter. In principle, with
> ## an infinite amount of data, it should approach zero. With finite
> ## amounts of data, or with measurement noise, it is not always clear
> ## what is the best value to choose. Past work on heart rate
> ## variability has suggested setting r to be 0.2 times the standard
> ## deviation of the data.
> 
> ## Another important parameter is the "embedding dimension." Again,
> ## there is know precise means of knowing the best such dimension, but
> ## previous work has used a dimension of 2. The final parameter is the
> ## embedding lag, which is often set to 1, but perhaps more
> ## appropriately is set to be the smallest lag at which the
> ## autocorrelation function of the time series is close to zero.
> 
> ## The apen function expects the data to be presented in a specific
> ## format. Working with a time series tseries, the following steps
> ## will compute the approximate entropy, with an embedding dimension
> ## of 2 and a lag of 1.
> 
> ## edim = 2;
> ## lag = 1;
> ## edata = lagembed(tseries,edim,lag);
> ## [pre,post] = getimage(edata,lag);
> ## r = 0.2*std(tseries);
> ## apen(pre,post,r);
> 
> edim <- 2
> lag <- 1
> edata <- lagembed(tseries,edim,lag)
> im <- getimage(edata,lag)
> r <- 0.2*sd(tseries)
> apen(im$pre,im$post,r)
> 
> apenembed <- function(tseries,edim,lag=1,relr=0.2,r) {
>   edata <- lagembed(tseries,edim,lag)
>   im <- getimage(edata,lag)
>   if (missing(r)) r <- relr*sd(tseries)
>   apen(im$pre,im$post,r)
> }
> 
> ## References
> 
> ##     * SM Pincus (1991) Proc. Natl. Acad. Sci. USA 88:2297-2301
> ##     * D Kaplan, MI Furman, SM Pincus, SM Ryan, LA Lipsitz, AL Goldberger
> (1991) "Aging and the complexity of cardiovascular dynamics," Biophys.J.
> 59:945-949 
> 
> ## See Also
> ## apenhr. lagembed. getimage.
> ## Examples
> 
> 
> lagembed <- function(ts,d,lag) {
>   z <- embed(ts,d)
>   z[seq(1,nrow(z),by=lag),]
> }
> 
> getimage <- function(x,pred) {
>   pre <- x[1:(nrow(x)-pred),]
>   post <- x[(pred+1):nrow(x),1]
>   list(pre=pre,post=post)
> }
> 
> ## tseries = randn(500,1);
> ## should have a large approximate entropy.
> 
> set.seed(1001)
> apenembed(rnorm(500),edim=3)
> ## 0.3240493
> 
> apenembed(sin(seq(1,100,by=0.2)),edim=3)
> 
> ## should have a small approximate entropy since it is deterministic. 
> ## 0.1001811
> 
> ## examples from utils web page:
> ts <- 1:5
> x <- lagembed(ts,3,1)
> im <- getimage(x,1)
> 
> apen <- function(pre,post,r) {
> 
>   ## translated from Kaplan D (1998) HRV software.
>   ## http://www.macalester.edu/~kaplan/hrv/doc/Feb3snap.tar
>   ## computer approximate entropy a la Steve Pincus
> 
>   N <- nrow(pre)
>   p <- ncol(pre)
>   
>   ## number of pairs of points closer than r in pre/post space
>   phiM <- phiMplus1 <- 0
>   for (k in 1:N) {
>     ## replicate the current point
>     foo <- matrix(rep(pre[k,],N),byrow=TRUE,nrow=N)
>     ## calculate distance (max norm)
>     goo <- abs(foo-pre)<=r
>     ## which ones of them are closer than r using the max norm?
>     closerpre <- if (p==1) goo else apply(goo,1,all)
>     precount <- sum(closerpre)
>     phiM <- phiM+log(precount)
>     
>     ## of the ones that were closer in the pre space, how many are closer
>     ## in post also ?
>     postcount <- sum(abs(post[closerpre] - post[k])<r)
>     phiMplus1 <- phiMplus1 + log(postcount)
>     ## cat(k,precount,phiM,postcount,phiMplus1,"\n")
>   }
>   (phiM-phiMplus1)/N
> }
> 
> ## also see:
> ## http://www.cbi.dongnocchi.it/glossary/ApEn.html
> ## http://www.physionet.org/physiotools/ApEn/
> 
> ts <- rep(61:65,10)
> apenembed(ts,d=5,r=2) ## get 0 instead of 0.00189?
> 
>