[R] Random Seed Location

Tue Feb 27 02:51:54 CET 2018

If your computations involve the parallel package then set.seed(seed)
may not produce repeatable results.  E.g.,

> cl <- parallel::makeCluster(3)  # Create cluster with 3 nodes on local
host
> set.seed(100); runif(2)
[1] 0.3077661 0.2576725
> parallel::parSapply(cl, 101:103, function(i)runif(2, i, i+1))
         [,1]     [,2]     [,3]
[1,] 101.7779 102.5308 103.3459
[2,] 101.8128 102.6114 103.9102
>
> set.seed(100); runif(2)
[1] 0.3077661 0.2576725
> parallel::parSapply(cl, 101:103, function(i)runif(2, i, i+1))
         [,1]     [,2]     [,3]
[1,] 101.1628 102.9643 103.2684
[2,] 101.9205 102.6937 103.7907

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Feb 26, 2018 at 4:30 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:

> In case you don't get an answer from someone more knowledgeable:
>
> 1. I don't know.
> 2.  But it is possible that other packages that are loaded after set.seed()
> fool with the RNG.
> 3. So I would call set.seed just before you invoke each random number
> generation to be safe.
>
> Cheers,
> Bert
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Mon, Feb 26, 2018 at 3:25 PM, Gary Black <gwblack001 at sbcglobal.net>
> wrote:
>
> > Hi all,
> >
> > For some odd reason when running naïve bayes, k-NN, etc., I get slightly
> > different results (e.g., error rates, classification probabilities) from
> > run
> > to run even though I am using the same random seed.
> >
> > Nothing else (input-wise) is changing, but my results are somewhat
> > different
> > from run to run.  The only randomness should be in the partitioning, and
> I
> > have set the seed before this point.
> >
> > My question simply is:  should the location of the set.seed command
> matter,
> > provided that it is applied before any commands which involve randomness
> > (such as partitioning)?
> >
> > If you need to see the code, it is below:
> >
> > Thank you,
> > Gary
> >
> >
> > A.      Separate the original (in-sample) data from the new
> (out-of-sample)
> > data.  Set a random seed.
> >
> > > InvestTech <- as.data.frame(InvestTechRevised)
> > > outOfSample <- InvestTech[5001:nrow(InvestTech), ]
> > > InvestTech <- InvestTech[1:5000, ]
> > > set.seed(654321)
> >
> > B.      Install and load the caret, ggplot2 and e1071 packages.
> >
> > > install.packages(“caret”)
> > > install.packages(“ggplot2”)
> > > install.packages(“e1071”)
> > > library(caret)
> > > library(ggplot2)
> > > library(e1071)
> >
> > C.      Bin the predictor variables with approximately equal counts using
> > the cut_number function from the ggplot2 package.  We will use 20 bins.
> >
> > > InvestTech[, 1] <- cut_number(InvestTech[, 1], n = 20)
> > > InvestTech[, 2] <- cut_number(InvestTech[, 2], n = 20)
> > > outOfSample[, 1] <- cut_number(outOfSample[, 1], n = 20)
> > > outOfSample[, 2] <- cut_number(outOfSample[, 2], n = 20)
> >
> > D.      Partition the original (in-sample) data into 60% training and 40%
> > validation sets.
> >
> > > n <- nrow(InvestTech)
> > > train <- sample(1:n, size = 0.6 * n, replace = FALSE)
> > > InvestTechTrain <- InvestTech[train, ]
> > > InvestTechVal <- InvestTech[-train, ]
> >
> > E.      Use the naiveBayes function in the e1071 package to fit the
> model.
> >
> > > model <- naiveBayes(`Purchase (1=yes, 0=no)` ~ ., data =
> InvestTechTrain)
> > > prob <- predict(model, newdata = InvestTechVal, type = “raw”)
> > > pred <- ifelse(prob[, 2] >= 0.3, 1, 0)
> >
> > F.      Use the confusionMatrix function in the caret package to output
> the
> > confusion matrix.
> >
> > > confMtr <- confusionMatrix(pred,unlist(InvestTechVal[, 3]),mode =
> > “everything”, positive = “1”)
> > > accuracy <- confMtr$overall[1]
> > > valError <- 1 – accuracy
> > > confMtr
> >
> > G.      Classify the 18 new (out-of-sample) readers using the following
> > code.
> > > prob <- predict(model, newdata = outOfSample, type = “raw”)
> > > pred <- ifelse(prob[, 2] >= 0.3, 1, 0)
> > > cbind(pred, prob, outOfSample[, -3])
> >
> >
> >
> >
> >
> >
> >
> > ---
> > This email has been checked for viruses by Avast antivirus software.
> > https://www.avast.com/antivirus
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]