[R] How important is set.seed

Jorgen Harmse JH@rm@e @end|ng |rom roku@com
Tue Mar 22 15:20:19 CET 2022


Jeff Newmiller makes an interesting point about distributed processing, but I don�t know how to use the usual pseudo-random processes to obtain deterministic results when I don�t know how the data will be sharded. You might have to replace pseudo-random sampling with deterministic sampling using a hash of something involving the unique key. Then the selection of a salt is the equivalent of a call to set.seed in non-parallel processing. The results should be the same as long as you fix the data set & the salt, and then you can test sensitivity to changes in the salt.
Jorgen Harmse


From: Neha gupta <neha.bologna90 using gmail.com>
To: "Ebert,Timothy Aaron" <tebert using ufl.edu>
Cc: Jeff Newmiller <jdnewmil using dcn.davis.ca.us>, "r-help using r-project.org"
        <r-help using r-project.org>
Subject: Re: [R] How important is set.seed
Message-ID:
        <CA+nrPnurAqBUgbrP-Oq4a8eo4Y7CO-k5xfH8c3EK-DGNCscidw using mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Thank you all.

Actually I need set.seed because I have to evaluate the consistency of
features selection generated by different models, so I think for this, it's
recommended to use the seed.

Warm regards

On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert using ufl.edu> wrote:

> If you are using the program for data analysis then set.seed() is not
> necessary unless you are developing a reproducible example. In a standard
> analysis it is mostly counter-productive because one should then ask if
> your presented results are an artifact of a specific seed that you selected
> to get a particular result. However, in cases where you need a reproducible
> example, debugging a program, or specific other cases where you might need
> the same result with every run of the program then set.seed() is an
> essential tool.
> Tim
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Jeff Newmiller
> Sent: Monday, March 21, 2022 8:41 PM
> To: r-help using r-project.org; Neha gupta <neha.bologna90 using gmail.com>; r-help
> mailing list <r-help using r-project.org>
> Subject: Re: [R] How important is set.seed
>
> [External Email]
>
> First off, "ML models" do not all use random numbers (for prediction I
> would guess very few of them do). Learn and pay attention to what the
> functions you are using do.
>
> Second, if you use random numbers properly and understand the precision
> that your specific use case offers, then you don't need to use set.seed.
> However, in practice, using set.seed can allow you to temporarily avoid
> chasing precision gremlins, or set up specific test cases for testing code,
> not results. It is your responsibility to not let this become a crutch... a
> randomized simulation that is actually sensitive to the seed is unlikely to
> offer an accurate result.
>
> Where to put set.seed depends a lot on how you are performing your
> simulations. In general each process should set it once uniquely at the
> beginning, and if you use parallel processing then use the features of your
> parallel processing framework to insure that this happens. Beware of
> setting all worker processes to use the same seed.
>
> On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 using gmail.com>
> wrote:
> >Hello everyone
> >
> >I want to know
> >
> >(1) In which cases, we need to use set.seed while building ML models?
> >
> >(2) Which is the exact location we need to put the set.seed function i.e.
> >when we split data into train/test sets, or just before we train a model?
> >
> >Thank you
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm
> >an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz
> >sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf
> >0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e=
> >PLEASE do read the posting guide
> >https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org
> >_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR
> >zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm
> >f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
> ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=
> 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_
> AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy
> RxpXsq4Y3TRMU&e=
> PLEASE do read the posting guide https://urldefense.proofpoint.
> com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.
> html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=
> s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL
> wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]




------------------------------

Subject: Digest Footer

_______________________________________________
R-help using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


------------------------------

End of R-help Digest, Vol 229, Issue 20
***************************************

	[[alternative HTML version deleted]]



More information about the R-help mailing list