The limits of Rnd

Adrian_Platts · August 11, 2016, 7:02pm

Sorry if my Math is going awry here, but I’m trying to use Rnd to model a relatively rare event (it occurs once in ~every 100 million instances), but I’m getting rather unexpected results.

The test scenario I have is below:

[quote]
dim d as double
dim x,y as Int64

y=0
for x =1 to 1e10
d=rnd()*1e9
if d<10 then y=y+1
next

print format(y,"#")[/quote]

What I’m expecting here is that d can range randomly from ~0 to ~1 billion. So d being <10 should occur approx once every hundred million iterations. So if I run the loop 10 billion times, I should see y count up to around 100. Perhaps a range of 50-200 might seem reasonable.

What I actually got the first time I ran this was a value for y of 6,159. So, approaching 100 times what I was expecting. This seems consistent with oddness in my actual simulation application.

Could anyone spot a trivial math error, or perhaps guide me towards a better solution? I suspect I’m just pushing Rnd beyond its design limits.

Many thanks!

Adrian

Robert_Weaver · August 11, 2016, 7:10pm

I found that when I used the Rnd function, the results seemed to become skewed after several million had been generated. I ended up implementing The Wichmann-Hill random number generator

[code]
//Global Properties
dim ix as int64
dim iy as int64
dim iz as int64

Function Rnd2() As Double
// A random number generator based on
// Wichmann and Hill, Algorithm AS 183:
// ‘An Efficient and Portable Pseudo-Random Number Generator’, Applied Statistics, 31, 188-190, 1982
// According to authors, more than 10^13 numbers will be generated before repetition
Dim temp As Double
ix=ix171 mod 30269
iy=iy172 mod 30307
iz=iz*170 mod 30323
temp = (0.0+ix)/30269.0 + (0.0+iy)/30307.0 + (0.0+iz)/30323.0
return temp-Floor(temp)
End Function[/code]

Will_Shank · August 11, 2016, 7:40pm

Nice.

fyi the 0.0+ is unnecessary.

Robert_Weaver · August 11, 2016, 7:49pm

[quote=281837:@Will Shank]
fyi the 0.0+ is unnecessary.[/quote]

Right. When I first implemented it, I was being overly cautious.

Paul_Lefebvre · August 11, 2016, 7:58pm

Do you get different results if you the Random class’ Number method instead?

Jeff_Tullin · August 11, 2016, 8:10pm

Any reason why the ix,iy,iz etc variables aren’t declared in the function as static?

Robert_Weaver · August 11, 2016, 8:23pm

I normally assign a seed value to ix, iy, and iz in either the window open event or the app open event. I wasn’t sure whether declaring them as static variables in the Rnd2 function would work.

Adrian_Platts · August 12, 2016, 12:56pm

Thanks Robert … great suggestion. I think I came across this once in the Intel algorithm book, but had completely forgotten about it.

As a test I tweaked the above code to use i. rnd(); ii. the Wichmann-Hill approach; and iii. the random class as Paul suggested. The results (below) suggest that the WH randomization hits into the expected range at almost exactly the expected frequency (100) albeit with a slight slow down, possibly due to the repeated type conversion between int64 and double? The random class sits between the extremes of WH and rnd.

$ time ./test 1
9236

real 7m19.800s
user 7m20.144s
sys 0m0.007s

$time ./test 2
122

real 14m47.826s
user 14m48.526s
sys 0m0.007s

time ./test 3
568

real 10m27.590s
user 10m28.086s
sys 0m0.004s

[quote]
dim d as double
dim e as new Random
dim x,y as Int64

ix=rnd()*1000000
iy=rnd()*1000000
iz=rnd()*1000000

select case val(args(1))

case 1
y=0
for x =1 to 1e10
d=rnd()*1e9
if d<10 then y=y+1
next

print  format(y,"#")

case 2

y=0
for x =1 to 1e10
  d=rnd2()*1e9
  if d<10 then y=y+1
next

print  format(y,"#")

case 3
y=0
for x =1 to 1e10
d=e.Number*1e9
if d<10 then y=y+1
next

print  format(y,"#")

end select[/quote]

Ramon_SASTRE · August 12, 2016, 5:43pm

I’ve been playing a moment with this code

[code]dim d as double
dim x,y as Int64

y=0
for x =1 to 1e10
d=rnd()*1e9
if d<10 then
y=y+1
end if
next[/code]
And it seems that the problem is not rnd() but precision. If you put a break at y=y+1 you will notice that d is always 0.
You will never get 4.5678 or 8.9944 for example.
if rnd() was 0.00000001234, then rnd()*1e9 should be 12.34 which is > 10.
However precision transforms 0.00000001234 into 0, so rnd()*1e9 = 0 which is < 0.
That’s why you get too many values < 10.

Adrian_Platts · August 12, 2016, 6:01pm

Hi Ramon. I’m not sure precision per se is the full story as rnd returns a double data type which can hold values between 10e-308 and 10e308? It seems that the algorithm used to generate the random numbers is perhaps just not very finely grained - when I look for values less than 1e-6 being returned by the random number generator, I see only two values 0.0000009536743 (2e-20) and 0.

For my purposes I think Robert’s solution works well.

Ramon_SASTRE · August 12, 2016, 7:11pm

Adrian,

You are right. There is an error in rnd() returning very small values.
Generally when I need random numbers I use to take profit of trigonometry.

abs(sin(x)) is always between 0 and 1 and returns (at least up to now) very random numbers. Probably “sin” can be more time consuming than other functions but the code is so simple…

[code]dim dd as double
dim x,y,d as Int64

y=0
for x =1 to 1e10
dd = abs(sin(x))
d=dd*1e9
if d<10 then y=y+1
next[/code]

In this case, after a very, very long for-next loop we get y=64
Its is not y=100, but as you said it seems reasonable.

Robert_Weaver · August 12, 2016, 9:44pm

It seems to me that the low distribution of small values was the problem that I experienced. I never really spent any time analyzing the problem, and just implemented the W-H algorithm.

The W-H algorithm was published in the 1980’s and was designed to work with 16 bit processors. I found that it still produces very uniformly distributed random numbers suitable for my work with Monte Carlo simulations. However, the authors published a new paper in 2005 upgrading the algorithm to meet current more stringent requirements. The new algorithm adds a 4th cycle, and to be complete, here it is:

Function Rnd3() As Double //A random number generator based on //Wichmann and Hill, Algorithm AS 183: //An Efficient and Portable Pseudo-Random Number Generator, Applied Statistics, 31, 188-190, 1982 //According to authors, more than 10^13 numbers will be generated before repetition // Revised to a 4 cycle generator per their 2005 paper: Generating good pseudo-random numbers // Generates 2^121 numbers before repeating Dim temp As Double ix = ix * 11600 mod 2147483579 iy = iy * 47003 mod 2147483543 iz = iz * 23000 mod 2147483423 iw = iw * 33000 mod 2147483123 temp = ix/2147483579.0 + iy/2147483543.0+iz/2147483423.0 + iw/2147483123.0 return temp-Floor(temp) End Function

One thing that I recommend when developing software that uses a pseudo random number generator, is to use the same seed values while debugging. That way, if you encounter problems your results should be repeatable and it’s then easier to locate bugs. Once you know that it’s working you can include some additional code to randomize the seed value for each run.

Robert_Weaver · August 13, 2016, 8:07am

A discussion of the random number generator came up earlier here:
https://forum.xojo.com/34087-how-random-is-random
In that discussion, Joe Ranieri mentioned that a comment in the random class [source] file states “the code was adapted from ‘Algorithms in C++’ (second edition) by Robert Sedgwick.”

I tried to follow this up, but was unable to locate a copy of Sedgwick’s book. It appears that it dates back to the early 1990’s or likely earlier (the 3rd edition is early 1990’s; the 2nd edition is obviously earlier but I couldn’t find a publication date for the 2nd ed.).

I decided that some further investigation was in order. So, I did some comparisons between the Xojo built-in RNG and the Wichmann-Hill algorithm (the earlier 3 cycle one). For these tests, A billion (1e9) random numbers were generated, and histograms plotted. Each bar on the histogram indicates the number of random numbers that fell into a particular interval. The interval for the leftmost bar is 0.0<=n<0.5e-5, and the second bar is for the interval 0.5e-5<=n<1.0e-5, the third bar for 1.0e-5<=n<1.5e-5 and so on. Consequently, there are 200,000 intervals altogether, but the histograms only show the first 300 intervals, which is more than enough to reveal the problems.
First, the Wichmann-Hill algorithm:

You’ll note that the bars are somewhat ragged as would be expected for randomly generated data.
.
.
Next, the Xojo built-in Rnd( ) function:

In this one there is clearly a cyclical pattern of spikes indicating that some intervals have a considerably higher frequency than the others.

If the intervals are decreased in size by a factor of 5, the problem becomes even more obvious.
Again starting with the Wichmann-Hill histogram:

It gets a bit more ragged due to the smaller intervals, but there’s still no apparent pattern.
.
.
Now look at the Xojo Rnd( ) function with the same interval size:

The spikes occur at regular intervals, and you’ll also notice that there are additionally regularly occurring dips which are spaced differently than the spikes. So, it’s definitely not a random distribution.
It appears that the Xojo Rnd( ) function needs to be fixed.

Note, I only tested the Rnd( ) function and not the Rnd( ) class, because I ran this as a Xojoscript for maximum speed and the random class couldn’t be accessed from a Xojoscript. I assumed that both random class and Rnd( ) use the same algorithm.

Eli_Ott · August 13, 2016, 9:04am

The Rnd() function and the Random class do not use the same algorithm as far as I know.
And the docs say that Random class supersedes the Rnd() function, so I’d use the Random class, which gives you better distribution.

Robert_Weaver · August 13, 2016, 10:50am

[quote=281991:@Eli Ott]The Rnd() function and the Random class do not use the same algorithm as far as I know.
[/quote]
If you can point to some documentation that supports this claim, I would very much appreciate it.

Adrian_Platts · August 13, 2016, 12:33pm

@Robert, based on the second set of tests I posted above, I think both the time to complete and the number of hits in the expected range perhaps support Eli’s assertion that the random class uses a different algorithm? However it is only appears to be somewhat more smoothly distributed in its output and W-H seems better.

Your test charts seem to show fairly dramatically how non-uniform the results of rnd() are - for my purposes this can generate pretty awful results. Kind of a shock and makes me wonder how many times in the past I may have placed too much faith in this function!

I think I found some of the code from the third edition of Sedgewick’s book (https://www.cs.princeton.edu/~rs/Algs3.cxx1-4/code.txt) and the rs directory above this page has links to further code snippets from other editions and similar books (not all are working including the 2nd edition). However I can’t see that he implements a custom random number generator, he seems to simply call the compiler’s rand() function in various contexts, and so this will be a compiler dependent result I guess … maybe in edition 2 he did use a custom function?

Robert_Weaver · August 13, 2016, 9:59pm

I was going to perform the same tests with the random class as I did from the Rnd() function, but ran out of time last night. I’ll do it today and post the results later. Then we’ll have a better idea whether they use the same algorithm or not.

I had a look through the code that you linked to, and I agree that there doesn’t appear to be any custom random function. Everything appears to call Rand() which I assume is the C++ built-in random function.

David_Andrews1 · August 13, 2016, 11:46pm

If you are using XOJO Rnd to generate a cryptographic nonce you are in trouble.

Perhaps it is an NSA backdoor that XOJO had to implement after a visit from Uncle Sam.

;)

Tim_Hare · August 14, 2016, 12:13am

For cryptographic purposes, one should call the AynRand() function. shrugs

Robert_Weaver · August 14, 2016, 12:20am

Actually AynRand() is considerably more biased.