[R] can you tell what .Random.seed *was*?
warren at etr-usa.com
Fri May 15 20:39:22 CEST 2009
G. Jay Kerns wrote:
> I want it to be *difficult* for students to figure out the seed and
> automatically generate solutions on their own.
Hmmm.... Would it really be a bad thing if someone reverse engineered
this to generate answers given the problem set? If it's hard enough to
do that, it'd be more worth solving than the given problem set. I call
that "extra credit".
> a brute force search of set.seed() is really
> pretty easy and fast... even for students at this level.
Either you're misunderstanding Stavros' benchmark results, or I am.
Could easily be the latter...I'm an R newbie.
As far as I can tell, the inner part of the loop does very little. If
that's right, Stavros is saying it will take 18 hours to try every
possible seed when the algorithm based on that seed takes almost no time
to run. But, if generating each problem set takes, say, a minute, it
will take 4.7 million years to generate a complete rainbow table when
there are 2^32 possible seeds.
> what if the Instructor
> inserted an *unknown* very large number of calls to the RNG near the
> beginning of the .Rnw (but after the set.seed)... and did not
> distribute this information to the students... that would make it
> much harder, yes?
There are better ways.
As above, one key to making rainbow tables impractical is making the
per-iteration time long enough. Even if it only takes a second to
generate each possible problem set, that's enough when multiplied by
high enough powers of 2.
The other key is using big enough powers of 2.
I hadn't looked into R's random number generation before, but it appears
quite robust. Seeding it with the current wall clock time (a 32-bit
integer on most systems) is an insult to its capability.
The default pseudo-random number generator (PRNG) in my copy of R is the
Mersenne Twister, a truly awesome algorithm. It's capable of very high
quality results, as long as you give it a good seed. It will take a
vector of *many* integers as a seed, not just one. It's not clear to me
from the R docs if you can pass an arbitrary array of integers with any
value, or if it needs something special.
Assuming you can give it any old passel of randomness as a seed, you
just have to find a good source of randomness to create that seed. On a
Linux box, you could concatenate several dozen bytes read from
/dev/random, the current wall clock time in microseconds, the inode of
the R script being run, the process ID of the R interpreter, and the
current mouse cursor position into a single string. Feed all that into
a hash algorithm, and break off pieces of that 4 bytes long, cast them
to integers, and send that array of ints to set.seed().
If you use SHA-256 as the hash algorithm, that scheme should give you
enough input randomness to get any of the possible 2^256 hash outputs,
making that the amount of possible problem sets. That's more than a
rainbow table buster...there aren't enough atoms in the visible universe
to construct a computer big enough to cope with 2^256 possible outputs.
That said, the quality of the PRNG just *allows* you to avoid screwing
up. It doesn't make it impossible make a weak algorithm.
More information about the R-help