[R] spsurvey analysis

Tim Howard tghoward at gw.dec.state.ny.us
Fri Nov 1 15:49:18 CET 2013


All,
I've used the excellent package, spsurvey, to create spatially balanced samples many times in the past. I'm now attempting to use the analysis portion of the package, which compares CDFs among sub-populations to test for differences in sub-population metrics. 
 
- My data (count data) have many zeros, following a negative binomial or even zero-inflated negative binomial distribution.
- Samples are within polygons of varying sizes
- I want to test whether a sample at time 1 is different from a sample at time 2. Essentially the same sample areas and number of samples.

The problem:
- cont.cdftest  throws a warning and does not complete for most (but not all) species sampled. Warning message: "The combined number of values in at least one class is less than five. Action: The user should consider using a smaller number of classes."

- There are plenty of samples in my two time periods (the dummy set below: Yr1=27, Yr2=31 non-zero values). 
 
My Question:
Why is it throwing this error and is there a way to get around it?



Reproduceable example (change path to spsurvey sample data), requires us to use spsurvey to generate sample points:

### R code tweaked from vignettes 'Area_Design' and 'Area_Analysis'
library(spsurvey)
### Analysis set up
setwd("C:/Program Files/R/R-3.0.2/library/spsurvey/doc")
att <- read.dbf("UT_ecoregions")
shp <- read.shape("UT_ecoregions")

set.seed(4447864)

# Create the design list
Stratdsgn <- list("Central Basin and Range"=list(panel=c(PanelOne=25), seltype="Equal"),
                  "Colorado Plateaus"=list(panel=c(PanelOne=25), seltype="Equal"),
                  "Mojave Basin and Range"=list(panel=c(PanelOne=10), seltype="Equal"),
                  "Northern Basin and Range"=list(panel=c(PanelOne=10), seltype="Equal"),
                  "Southern Rockies"=list(panel=c(PanelOne=14), seltype="Equal"),
                  "Wasatch and Uinta Mountains"=list(panel=c(PanelOne=10), seltype="Equal"),
                  "Wyoming Basin"=list(panel=c(PanelOne=6), seltype="Equal"))

# Select the sample design for each year
Stratsites_Yr1 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
                   type.frame="area", src.frame="sp.object",
                   sp.object=shp, att.frame=att, stratum="Level3_Nam", shapefile=FALSE)

Stratsites_Yr2 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
                   type.frame="area", src.frame="sp.object",
                   sp.object=shp, att.frame=att, stratum="Level3_Nam", shapefile=FALSE)

#extract the core information, add year as a grouping variable, add a plot ID to link with dummy data
Yr1 <- cbind(pltID = 1001:1100, Stratsites_Yr1 at data[,c(1,2,3,5)], grp = "Yr1")
Yr2 <- cbind(pltID = 2001:2100, Stratsites_Yr2 at data[,c(1,2,3,5)], grp = "Yr2")         
sitedat <- rbind(Yr1, Yr2)

# create dummy sampling data. Lots of zeros!
bn.a <- rnbinom(size = 0.06, mu = 19.87, n=100)
bn.b <- rnbinom(size = 0.06, mu = 20.15, n=100)
dat.a <- data.frame(pltID = 1001:1100, grp = "Yr1",count = bn.a)
dat.b <- data.frame(pltID = 2001:2100, grp = "Yr2",count = bn.b)
dat <- rbind(dat.a, dat.b)

####
## Analysis begins here
####
data.cont <- data.frame(siteID = dat$pltID, Density=dat$count)
sites <- data.frame(siteID = dat$pltID, Use=rep(TRUE, nrow(dat)))
subpop <- data.frame(siteID = dat$pltID, 
                        All_years=(rep("allYears",nrow(dat))),
                        Year = dat$grp)
design <- data.frame(siteID = sitedat$pltID,
                        wgt = sitedat$wgt,
                        xcoord = sitedat$xcoord,
                        ycoord = sitedat$ycoord)
framesize <- c("Yr1"=888081202000, "Yr2"=888081202000)                        
                        
## There seem to be pretty good estimates
CDF_Estimates <- cont.analysis(sites, subpop, design, data.cont, 
                        popsize = list(All_years=sum(framesize),
                        Year = as.list(framesize)))

print(CDF_Estimates$Pct)

## this test fails
CDF_Tests <- cont.cdftest(sites, subpop[,c(1,3)], design, data.cont,
   popsize=list(Year=as.list(framesize)))
warnprnt()

## how many records have values greater than zero, by year?   Probably irrelevant!
notZero <- dat[dat$count > 0,]
table(notZero$grp)

### end

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
 
locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

Thanks in advance.
Tim



More information about the R-help mailing list