[R] Random Seed Location

Tue Feb 27 01:30:47 CET 2018

In case you don't get an answer from someone more knowledgeable:

1. I don't know.
2.  But it is possible that other packages that are loaded after set.seed()
fool with the RNG.
3. So I would call set.seed just before you invoke each random number
generation to be safe.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Feb 26, 2018 at 3:25 PM, Gary Black <gwblack001 at sbcglobal.net>
wrote:

> Hi all,
>
> For some odd reason when running naïve bayes, k-NN, etc., I get slightly
> different results (e.g., error rates, classification probabilities) from
> run
> to run even though I am using the same random seed.
>
> Nothing else (input-wise) is changing, but my results are somewhat
> different
> from run to run.  The only randomness should be in the partitioning, and I
> have set the seed before this point.
>
> My question simply is:  should the location of the set.seed command matter,
> provided that it is applied before any commands which involve randomness
> (such as partitioning)?
>
> If you need to see the code, it is below:
>
> Thank you,
> Gary
>
>
> A.      Separate the original (in-sample) data from the new (out-of-sample)
> data.  Set a random seed.
>
> > InvestTech <- as.data.frame(InvestTechRevised)
> > outOfSample <- InvestTech[5001:nrow(InvestTech), ]
> > InvestTech <- InvestTech[1:5000, ]
> > set.seed(654321)
>
> B.      Install and load the caret, ggplot2 and e1071 packages.
>
> > install.packages(“caret”)
> > install.packages(“ggplot2”)
> > install.packages(“e1071”)
> > library(caret)
> > library(ggplot2)
> > library(e1071)
>
> C.      Bin the predictor variables with approximately equal counts using
> the cut_number function from the ggplot2 package.  We will use 20 bins.
>
> > InvestTech[, 1] <- cut_number(InvestTech[, 1], n = 20)
> > InvestTech[, 2] <- cut_number(InvestTech[, 2], n = 20)
> > outOfSample[, 1] <- cut_number(outOfSample[, 1], n = 20)
> > outOfSample[, 2] <- cut_number(outOfSample[, 2], n = 20)
>
> D.      Partition the original (in-sample) data into 60% training and 40%
> validation sets.
>
> > n <- nrow(InvestTech)
> > train <- sample(1:n, size = 0.6 * n, replace = FALSE)
> > InvestTechTrain <- InvestTech[train, ]
> > InvestTechVal <- InvestTech[-train, ]
>
> E.      Use the naiveBayes function in the e1071 package to fit the model.
>
> > model <- naiveBayes(`Purchase (1=yes, 0=no)` ~ ., data = InvestTechTrain)
> > prob <- predict(model, newdata = InvestTechVal, type = “raw”)
> > pred <- ifelse(prob[, 2] >= 0.3, 1, 0)
>
> F.      Use the confusionMatrix function in the caret package to output the
> confusion matrix.
>
> > confMtr <- confusionMatrix(pred,unlist(InvestTechVal[, 3]),mode =
> “everything”, positive = “1”)
> > accuracy <- confMtr$overall[1]
> > valError <- 1 – accuracy
> > confMtr
>
> G.      Classify the 18 new (out-of-sample) readers using the following
> code.
> > prob <- predict(model, newdata = outOfSample, type = “raw”)
> > pred <- ifelse(prob[, 2] >= 0.3, 1, 0)
> > cbind(pred, prob, outOfSample[, -3])
>
>
>
>
>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]