[R] Merging outbreak data

Jim Lemon jim at bitwrit.com.au
Fri Aug 1 11:56:33 CEST 2014


On Fri, 1 Aug 2014 07:25:05 AM barbara tornimbene wrote:
> HI.
> I have a set of disease outbreak data. Each observation have a
> location (spatial coordinates) and a start date. Outbreaks that occur 
in
> the same location within a two week periods have to be merged. 
Basically I
> need to delete duplicates that have same spatial coordinated and 
start
> dates comprised in a two weeks range. I am ok with the first bit
> (coordinates), but It is the date range that I am not sure how to 
define. I
> thought about creating a dummy variable for observations within a 
date
> range, but those might have different locations. Any help would be 
greatly
> appreciated. Thanks

Hi barbara,
I assume that the spatial coordinates have to be within a certain 
distance to be considered the same, unless they are based on 
something like cities or health administration districts. If your 
observations can be ordered by date, the problem is not too difficult.

date_range<-as.Date(c("1/1/2014","1/8/2014"),"%d/%m/%Y")
disease.df<-data.frame(
 onset=sample(seq(date_range[1],date_range[2],by=1),100),
 lat=sample(seq(-33,-35,by=-1),100,TRUE),
 lon=sample(seq(148,151,by=1),100,TRUE))
disease.df<-disease.df[order(disease.df$onset),]
disease.df$drop<-0
nobs<-dim(disease.df)[1]
for(start in 1:(nobs-1)) {
 cat(start,"\n")
 end<-start+1
 while(disease.df$onset[end] < disease.df$onset[start]+14 &&
  end < nobs) end<-end+1
 if(disease.df$onset[end] - disease.df$onset[start] > 14)
  end<-end-1
 sameplace<-
  disease.df$lat[start] == disease.df$lat[(start+1):end] &
  disease.df$lon[start] == disease.df$lon[(start+1):end]
 if(any(sameplace)) {
  disease.df$drop[start]<-1
  disease.df$drop[(start+1):end]<-
   disease.df$drop[(start+1):end]+sameplace
 }
}

Caution - I haven't checked this exhaustively and I have assumed that 
locations must be equal, not within some distance.

Jim



More information about the R-help mailing list