[R] Subsetting data for split-sample validation, then repeating 1000x

Angela Boag Angela.Boag at colorado.edu
Thu Aug 21 23:45:38 CEST 2014


Hi all,

I'm doing some within-dataset model validation and would like to subset a
dataset 70/30 and fit a model to 70% of the data (the training data), then
validate it by predicting the remaining 30% (the testing data), and I would
like to do this split-sample validation 1000 times and average the
correlation coefficient and r2 between the training and testing data.

I have the following working for a single iteration, and would like to know
how to use either the replicate() or for-loop functions to average the 1000
'r2' and 'cor' outputs.

--

# create 70% training sample
A.samp <- sample(1:nrow(A),floor(0.7*nrow(A)), replace = TRUE)

# Fit model (I'm modeling native plant richness, 'nat.r')
A.model <- glmmadmb(nat.r ~ isl.sz + nr.mead, random = ~ 1 | site, family =
"poisson", data = A[A.samp,])

# Use the model to predict the remaining 30% of the data
A.pred <- predict(A.model, newdata = A[-A.samp,], type = "response")

# Correlation between predicted 30% and actual 30%
cor <- cor(A[-A.samp,]$nat.r, A.pred, method = "pearson")

# r2 between predicted and observed
lm.A <- lm(A.pred ~ A[-A.samp,]$nat.r)
r2 <- summary(lm.A)$r.squared

# print values
r2
cor

--

Thanks for your time!

Cheers,
Angela

--
Angela E. Boag
Ph.D. Student, Environmental Studies
CAFOR Project Researcher
University of Colorado, Boulder
Mobile: 720-212-6505

	[[alternative HTML version deleted]]



More information about the R-help mailing list