Migrating from RealBasic

Did you try Profiling ?

Thanks for all of the comments.

First of all, any speedups to the text related code will be insignificant, because there is only a single line of text appended after every 1000 iterations of the outermost loop. That happens about once every two minutes. The rest of the time there is no text related code executing. I set up the text output this way deliberately to make sure that it wouldn’t have any significant effect on speed.

However Will Shank’s comments about precomputing cos(phi) will likely help. I thought I’d precomputed everything that would save time, but I see that I did miss a few things.

I should probably also put the code from the function HlxIntgrndRad() inline in the calling routine. I imagine there’s quite a bit of overhead in the function call.

I’ll also experiment with the pragma settings. I had thought about the overhead involved with bounds checking and such things, but at the time I wasn’t sure how to turn it off. Then I forgot about it later.

If you want to test it out, try these inputs
n=400
pitch=0.001
r=0.15
dw=0.0005
MaxErr=0.0000001
Nmonte=1000000

This should produce a result of about 0.0265535…

The thing that makes it take 20 hours is because Nmonte=1000000. You may want to begin with Nmonte=1000. As Nmonte is made larger, the result of the integration gets more accurate, but of course it gets much slower.
As for my computer speed, I’m using a mid-2015 MacBook Pro, 2.2 GHz Intel Core i7.

I didn’t include the code for my random number generator rnd2, but you can substitute the built-in random number generator that returns a real value between 0 and 1. It will be good enough for testing.

The time to append text grows the longer the text is. Where Kem pointed out there isn’t much difference between + and Append, I get similar results adding “a” a million times. But change that to add “the quick brown fox jumped” and + is 110x longer while Append stayed the same.

In my testing having it inline didn’t make a difference or maybe even slightly worse. Not sure why. Way back I’d measured the time to call a method and remember it being equivalent to 2 double multiplies. It’s not that much over head unless called alot (which HlxIntgrndRad is so I’m sorta confused).

What I did find to make a big difference is XojoScript. Previous to you posting those input values I guessed at some and the method in Xojo proper took 4.6secs. Running it in XojoScript took 2.3 seconds. Then some further optimizations in the math (precomputing as much as possible, Cos(x) = Cos(-x), replace x^2 with x*x) brought it down to 0.53 secs.

Using your example input takes 102secs with the original code but 9.4secs in XojoScript. I can clean it up and post later on.

To (partly) answer your original question, I just did a (very) quick test without any optimization in the main thread.

With these parameters:

Dim n As Int64 = 400 Dim pitch As Double = 0.001 Dim r As Double = 0.15 Dim dw As Double = 0.0005 Dim MaxErr As Double = 0.0000001 Dim Nmonte As Int64 = 1000

Running in the IDE: 96 seconds.
Compiled 32 bit App: 81 seconds
Complied 64 bit App: 29 seconds

It’s really late here and only ran 1 test each.

Here’s an app with optimizations.
http://trochoid.weebly.com/uploads/6/2/6/4/62644133/int2.zip
I don’t have 64bit compiles to test, maybe someone can try that out to see if XojoScript is still faster. From what others are reporting it might not be.

In the app there’s a field to enter Nmonte and 4 buttons that each run a different version, reporting the calculated value and time. Progress text has been removed from all of them and they run in the main thread so the app locks up while calculating.

buttons…

Original - Forum source with a few tweaks: pragmas, Cos(phi)

SomeOptimize - inlined HlxIntgrndRad and some factoring

XojoScript - runs in XojoScript and much more factoring of the innermost Msnow for loop

FullOptimize - the same XojoScript source but running in Xojo proper. To see how 64bit compares

An odd thing I found was one place the order of the sum mattered. The line that sums 4 HlxIntgrndRad was inlined and factored some, bring the time to 9.5sec. Then I factored it even more and the time went to 9.6sec!? It looked like this…

    pd = y-pa
    Sum2 =        pre1/sqrt(pre3+pd*pd)
    pd = y-pb
    Sum2 = Sum2 + pre2/sqrt(pre4+pd*pd)
    pd = y+pa
    Sum2 = Sum2 + pre1/sqrt(pre3+pd*pd)
    pd = y+pb
    Sum2 = Sum2 + pre2/sqrt(pre4+pd*pd)

Notice it’s pre1/pre3 then pre2/pre4 then pre1/pre3 again. I reordered the lines to group pres as below and it runs in 9.3sec!? It’s like the extra factoring doesn’t work unless the order is right, I don’t get it.

    pd = y-pa
    Sum2 =        pre1/sqrt(pre3+pd*pd)
    pd = y+pa
    Sum2 = Sum2 + pre1/sqrt(pre3+pd*pd)
    pd = y-pb
    Sum2 = Sum2 + pre2/sqrt(pre4+pd*pd)
    pd = y+pb
    Sum2 = Sum2 + pre2/sqrt(pre4+pd*pd)

It’d be nice to get rid of some sqrts but I haven’t spotted any yet.

My times running built 32bit Nmonte 1000

Original     - 112 secs
SomeOptimize - 102 secs
XojoScript   - 9.3 secs
FullOptimize - 57 secs

That’s some nice work Will. I’ll have to look later.

Nice Will!
Here are my results.
For some reason, SomeOptimize seems to run slower than Original.
Ran 2 times each. Pretty much the same results:
IDE:

Original ============================
Nmonte: 1000
output value: 0.0265527
total time: 73.71240348 sec

SomeOptimize ============================
Nmonte: 1000
output value: 0.0265545
total time: 94.85303168 sec

XojoScript ============================
Nmonte: 1000
output value: 0.0265531
total time: 7.94062844 sec

FullOptimize ============================
Nmonte: 1000
output value: 0.0265532
total time: 66.81488397 sec

32bit Compiled:

Original ============================
Nmonte: 1000
output value: 0.0265529
total time: 60.63025901 sec

SomeOptimize ============================
Nmonte: 1000
output value: 0.0265532
total time: 83.7917771 sec

XojoScript ============================
Nmonte: 1000
output value: 0.0265526
total time: 7.8004196 sec

FullOptimize ============================
Nmonte: 1000
output value: 0.0265558
total time: 50.21863616 sec

No luck with compiling 64bit:

I knew Xojoscript was fast but I never imagined it was that much faster. Very impressive.

Yes but…
We’ve known Xojoscript was faster for some time, but I don’t understand why, or more exactly why the compiled version is so much slower.
Its like knowing that a Formula 1 car is available but only being able to buy a tractor.
Is it because there is no UI overhead (and if so, why can’t console apps run at the same speed?)

Xojoscript is not available in 64-bit. You could recompile the test if you comment out all references to xojoscript. It will be interesting to see how Xojo 64 handles this, since the LLVM compiler is used both for Xojosctrpi and for 64-bit.

XojoScript is faster because it uses the LLVM compiler instead of the Xojo built-in compiler. It was the test-bed for LLVM. Since 64-bit also uses LLVM, you should see similar speed gains (depending on what optimizations are being used).

For comparisons sake, I ran a 64-bit version (have to remove the XojoScript stuff first). Results:

[code]64-bit
FullOptimize ============================
Nmonte: 1000
output value: 0.0265535
total time: 17.34514127 sec

Original ============================
Nmonte: 1000
output value: 0.026551
total time: 20.92177342 sec

32-bit
FullOptimize ============================
Nmonte: 1000
output value: 0.0265509
total time: 54.02202895 sec

Original ============================
Nmonte: 1000
output value: 0.0265527
total time: 66.24706289 sec

XojoScript ============================
Nmonte: 1000
output value: 0.0265546
total time: 9.37376832 sec[/code]

Full scores. Xojoscript taken out before compiling 64 bit.

IDE:
Original ============================
Nmonte: 1000
output value: 0.0265527
total time: 73.71240348 sec

SomeOptimize ============================
Nmonte: 1000
output value: 0.0265545
total time: 94.85303168 sec

XojoScript ============================
Nmonte: 1000
output value: 0.0265531
total time: 7.94062844 sec

FullOptimize ============================
Nmonte: 1000
output value: 0.0265532
total time: 66.81488397 sec

32 Bit compiled:
Original ============================
Nmonte: 1000
output value: 0.0265529
total time: 60.63025901 sec

SomeOptimize ============================
Nmonte: 1000
output value: 0.0265532
total time: 83.7917771 sec

XojoScript ============================
Nmonte: 1000
output value: 0.0265526
total time: 7.8004196 sec

FullOptimize ============================
Nmonte: 1000
output value: 0.0265558
total time: 50.21863616 sec

64 bit compiled:
Original ============================
Nmonte: 1000
output value: 0.0265542
total time: 18.64457201 sec

SomeOptimize ============================
Nmonte: 1000
output value: 0.0265539
total time: 26.75712824 sec

XojoScript ============================
N/A

FullOptimize ============================
Nmonte: 1000
output value: 0.0265516
total time: 15.36333577 sec

Xojoscript is still twice as fast as compared to the compiled 64 bit though.

Again, thanks everyone for spending the time to look at this.

As I was reviewing the program and working at moving the HlxIntgrndRad() function code inline into the calling routine, I noticed something that I’d previously missed. By applying the trigonometric identity:
cos(x)=cos(-x)
the code can be simplified considerably. I’m still testing things, but it looks as if this this should give a good improvement in performance.

Also, it’s good to see the performance improvement with the 64 bit builds. It looks like it will definitely be worth the effort to move this to xojo.

I’m not familiar with Xojoscript. I guess I’ll have to check it out.

I doubt XojoScript would be faster. The reason it used to be faster is because XojoScript used the optimizations that are now available to compiled 64-bit Xojo binaries.

Can’t say for sure without testing. But my guess is 64-bit compiled will be as fast or faster.

Argh! I typed my last reply before seeing Paul and Marco’s replies. (Always start reading from the top.)

I’m shocked XojoScript is still faster.

There’s a trade-off between optimization and compile time. You don’t see it as much with a XojoScript, as they tend to be smaller. I think they had to back off on the optimizations for building the app because it took forever.

It would be great to have some Build Settings then.
As long as debug runs compile/build fast (not optimized), I don’t mind waiting 10x longer for my deployment builds to finish because they’re being highly optimized.

As in “we had compiles that ran for days” kind of forever

But as long as the developer is warned in advance, it may still be worthwhile in some cases. Of course, since I’m someone who regularly does 20 hour runs of an app, I may not be a typical user. :slight_smile:

Problem is there is no way to ahead of time say “This compile is going to take days”
It might, or might not.