[R] How important is set.seed

Neha gupta neh@@bo|ogn@90 @end|ng |rom gm@||@com
Tue Mar 22 17:18:59 CET 2022


I read a paper two days ago (and that's why I then posted here about
set.seed) which used interpretable machine learning.

According to the authors, different explanations (of the black-box models)
will be produced by the ML models if different seeds are used or never
used.



On Tue, Mar 22, 2022 at 5:12 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:

> OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless.
> But...
>
> 1. set.seed() is used to make any procedure that uses R's
> pseudo-random number generator -- including, for example, sampling
> from a distribution, random data splitting, etc. -- "reproducible".
> That is, if the procedure is repeated *exactly,* by invoking
> set.seed() with its original argument values (once!) *before* the
> procedure begins, exactly the same results should be produced by the
> procedure. Full stop. It does not matter how many times random number
> generation occurs within the procedure thereafter -- R preserves the
> state of the rng between invocations (but see the notes in ?set.seed
> for subtle qualifications of this claim).
>
> 2. Hence, if no (pseudo-) random number generation is used, set.seed()
> is irrelevant. Full stop.
>
> 3. Hence, if you don't care about reproducibility (you should! -- if
> for no other reason than debugging), you don't need set.seed()
>
> 4. The "randomness" of any sequence of results from any particular
> set.seed() arguments (including further calls to the rng) is a complex
> issue. ?set.seed has some discussion of this, but one needs
> considerable expertise to make informed choices here. As usual, we
> untutored users should be guided by the expert recommendations of the
> Help file.
>
> *** If anything I have said above is wrong, I would greatly appreciate
> a public response here showing my error.***
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>
> On Tue, Mar 22, 2022 at 7:48 AM Neha gupta <neha.bologna90 using gmail.com>
> wrote:
> >
> > Hello Tim
> >
> > In some of the examples I see in the tutorials, they put the random seed
> > just before the model training e.g train function in case of caret
> library.
> > Should I follow this?
> >
> > Best regards
> > On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
> >
> > > Ah, so maybe what you need is to think of “set.seed()” as a treatment
> in
> > > an experiment. You could use a random number generator to select an
> > > appropriate number of seeds, then use those seeds repeatedly in the
> > > different models to see how seed selection influences outcomes. I am
> not
> > > quite sure how many seeds would constitute a good sample. For me that
> would
> > > depend on what I find and how long a run takes.
> > >
> > >   In parallel processing you set seed in master and then use a random
> > > number generator to set seeds in each worker.
> > >
> > > Tim
> > >
> > >
> > >
> > > *From:* Neha gupta <neha.bologna90 using gmail.com>
> > > *Sent:* Tuesday, March 22, 2022 6:33 AM
> > > *To:* Ebert,Timothy Aaron <tebert using ufl.edu>
> > > *Cc:* Jeff Newmiller <jdnewmil using dcn.davis.ca.us>; r-help using r-project.org
> > > *Subject:* Re: How important is set.seed
> > >
> > >
> > >
> > > *[External Email]*
> > >
> > > Thank you all.
> > >
> > >
> > >
> > > Actually I need set.seed because I have to evaluate the consistency of
> > > features selection generated by different models, so I think for this,
> it's
> > > recommended to use the seed.
> > >
> > >
> > >
> > > Warm regards
> > >
> > > On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert using ufl.edu>
> wrote:
> > >
> > > If you are using the program for data analysis then set.seed() is not
> > > necessary unless you are developing a reproducible example. In a
> standard
> > > analysis it is mostly counter-productive because one should then ask if
> > > your presented results are an artifact of a specific seed that you
> selected
> > > to get a particular result. However, in cases where you need a
> reproducible
> > > example, debugging a program, or specific other cases where you might
> need
> > > the same result with every run of the program then set.seed() is an
> > > essential tool.
> > > Tim
> > >
> > > -----Original Message-----
> > > From: R-help <r-help-bounces using r-project.org> On Behalf Of Jeff
> Newmiller
> > > Sent: Monday, March 21, 2022 8:41 PM
> > > To: r-help using r-project.org; Neha gupta <neha.bologna90 using gmail.com>;
> r-help
> > > mailing list <r-help using r-project.org>
> > > Subject: Re: [R] How important is set.seed
> > >
> > > [External Email]
> > >
> > > First off, "ML models" do not all use random numbers (for prediction I
> > > would guess very few of them do). Learn and pay attention to what the
> > > functions you are using do.
> > >
> > > Second, if you use random numbers properly and understand the precision
> > > that your specific use case offers, then you don't need to use
> set.seed.
> > > However, in practice, using set.seed can allow you to temporarily avoid
> > > chasing precision gremlins, or set up specific test cases for testing
> code,
> > > not results. It is your responsibility to not let this become a
> crutch... a
> > > randomized simulation that is actually sensitive to the seed is
> unlikely to
> > > offer an accurate result.
> > >
> > > Where to put set.seed depends a lot on how you are performing your
> > > simulations. In general each process should set it once uniquely at the
> > > beginning, and if you use parallel processing then use the features of
> your
> > > parallel processing framework to insure that this happens. Beware of
> > > setting all worker processes to use the same seed.
> > >
> > > On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 using gmail.com
> >
> > > wrote:
> > > >Hello everyone
> > > >
> > > >I want to know
> > > >
> > > >(1) In which cases, we need to use set.seed while building ML models?
> > > >
> > > >(2) Which is the exact location we need to put the set.seed function
> i.e.
> > > >when we split data into train/test sets, or just before we train a
> model?
> > > >
> > > >Thank you
> > > >
> > > >       [[alternative HTML version deleted]]
> > > >
> > > >______________________________________________
> > > >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm
> > >
> >an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz
> > >
> >sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf
> > > >0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e=
> > > >PLEASE do read the posting guide
> > > >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org
> > >
> >_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR
> > >
> >zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm
> > > >f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
> > > >and provide commented, minimal, self-contained, reproducible code.
> > >
> > > --
> > > Sent from my phone. Please excuse my brevity.
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
> > > ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=
> > > 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_
> > >
> AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy
> > > RxpXsq4Y3TRMU&e=
> > > PLEASE do read the posting guide https://urldefense.proofpoint.
> > > com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.
> > > html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=
> > > s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL
> > > wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list