Has anybody benchmarked XOJO against other Basic Compilers ?

Eduardo_Gutierrez_de_Oliveira · May 6, 2014, 10:39pm

[quote=86192:@Kem Tekinay]Them’s fightin’ words!

Seriously, I’m not sure what that means. Can you rephrase or give me an example?[/quote]

I think it’s a natural language parser written entirely by parsing strings. As such, I think “expressive” means “humanly readable” and is not really related to “regular expressions”, which for all the power they have, are pretty strict and cryptic (the opposite of “expressive”, when considering readability).

I don’t think it was disparaging of RegEx but sort of pointing at how different the expression parser is to RegEx (which wouldn’t be further from “natural” as can be).

I’ve been thinking for a couple of months on doing a natural language parser for building complex searches (in html and php, though; not Xojo). Reading this is making me reconsider

Daniel_Taylor · May 6, 2014, 11:40pm

[quote=86218:@Michel Bujardet]in the IDE or built execution takes 46 seconds, 44 when the ProgressBar is out.

Of course this test does not mean much, but it does show that instr() takes time[/quote]

It’s your string generator that’s taking most of the time. In the IDE, no progress bar, I got 47 seconds for the whole function, 37 seconds of which were spent in the internal loop that builds a random string. I’m not sure what InRange is like, but every buff+ has to create a new string object in memory and copy the bytes from the previous one. Chr(n) also takes time.

I believe a much faster option would be to allocate a MemoryBlock outside the loops, write your random ASCII integers to the block, then get the string from the block. Simply reuse the block, don’t recreate it each time.

Then again if all you’re trying to do is measure InStr create one string and search it over and over. InStr shouldn’t vary that much based on the string, especially if you use the InStrB version which should also be faster. It doesn’t look like you’re worried about >8 bit characters, so InStrB is fine.

Daniel_Taylor · May 6, 2014, 11:51pm

[quote=86188:@Gary Miller]I have an Intel i7 8GM with a 256GB SSD.

The code reads a a main.nlp that has includes that contain 46,500 complex patterns (and growing) at 10 lines each on average.

They are parsed into an in memory array of structures and a QuikSort is used sort the patterns into search priority order.

That takes 26 seconds.[/quote]

Earlier you made it sound like the whole process was taking 26s. Once you’ve loaded/parsed main.nlp does the actual search execute quickly, or is that something which also needs to be optimized?

Mike_Cotrone · May 7, 2014, 12:08am

Here are my in depth test results:

10 Hours compile time

20s compile time

Michel_Bujardet · May 7, 2014, 12:44am

You are right.

I have separated the generation from the string search. Now I find 9.4 - 12.98 seconds.

Sub Action() #pragma DisableBackgroundTasks dim things(46500) as string dim that as integer dim buff as string dim ii as integer ProgressBar1.Maximum = 46500 ProgressBar1.Value = 0 // Build strings for recordnumber as integer = 1 to 46500 Progressbar1.Value = recordnumber buff = "" for stringlength as integer = 1 to 1000 ii = App.Randomizer.InRange(32, 223) buff = buff+chr(32+ii) next stringlength things(recordnumber) = buff next recordnumber dim starter as double = microseconds/1000000 ProgressBar1.Value = 0 // Start search for recordnumber as integer = 1 to 46500 Progressbar1.Value = recordnumber buff = things(recordnumber) that = instr(buff,"Onceuponatime") that = instr(buff,"Aprincesswhas") that = instr(buff,"asleepinacasttle") that = instr(buff,"shehadbeencursed") that = instr(buff,"byabadapple") next recordnumber TextField1.Text = str((Microseconds/1000000)-starter) End Sub

Michel_Bujardet · May 7, 2014, 12:48am

[quote=86257:@Mike Cotrone]Here are my in depth test results:

Yeah ! Turbo Basic forever !

Gary_Miller · May 7, 2014, 1:15pm

Daniel Taylor asked…

Earlier you made it sound like the whole process was taking 26s. Once you’ve loaded/parsed main.nlp does the actual search execute quickly, or is that something which also needs to be optimized?

The load of main.nlp and all of it’s include files. the QuikSort and the loading of a large dictionary for user input spell correct takes about 26 seconds.

The length of time it takes to match a pattern varies based upon the position the pattern it is eventually matched to with in the prioritized (sorted) list of pattern structures.

Obviously if the input matched the first pattern in the array it answers instantly if it isn’t matched until the last pattern in the array or goes the whole way through the array without finding a match then it can be 20 seconds.

If the user types in “What time is it now I have to be going to work.”

The first sentence in the input gets resolved in one pass through he pattern matcher but then a second pass is required to consume the remaining second sentence. So that input could easily take 2 x 20 seconds because two patterns need to be match to resolve a single input. Responding to a longer text stream such as a paragraph would take proportionally longer again.

Kem Tekinay referred to my comment on regular expressions…

Didn’t mean to start a flame war Ken about regular expressions I do Unix too so have done my share of Awk, Sed and grep but when I tried to map all the English ways to express a single thought into a single regular expression pattern it quickly became very difficult for me to debug due to all the nesting and subexpressions.

My pattern matcher also allows me to pull variables from the user input that match a specific pattern variable in the pattern.

Pattern variables can also be nested like [FIrst_Name]={[Male_First_Name]|[Female_First_Name]} to any depth,

For example… << pattern and >> template (only executes if pattern is matched)

<< I {{conversed|had a conversation|spoke} with|talked to} [Male_First_Name] and [Female_First_Name] ( in [City])( today)(.)

What did they have to say.
[They]=“[Last([Male_First_Name])] and “[Last([Female_First_Name])] "
[He]=”[Last([Male_First_Name])]”
[She]=“[Last([Female_First_Name])] "
[CurrentTopic]='your conversation with [He] and [She]”

So the pattern variables [Male_First_Name] and [Female_First_Name] can expand into a couple of thousand first names.

[City] can expand into a couple of thousand cities, etc…

And the pattern variables {he], [She] , [They], [CurrentTopic] can be set so that in subsequent patterns if the user types

“He told me he was getting a divorce.” we now know who he is and can refer to him directly until another male first name is used and ambiguity may be introduced.

or

“What are we talking about” can return [CurrentTopic] and say “We were talking about Dave and Sally.”

This pattern was still simplified a lot from my actual patterns where [They] can refer to any combination of multiple he and shes with optional last names in addition to first names and optional titles such as {Dr|Miss|Mrs|Mr}(.) so I think you can see how all this would make a regular expression difficult to manage.

Kem_Tekinay · May 7, 2014, 2:12pm

Not at all, I truly didn’t understand what you meant. Thanks for the clarification.

Markus_Winter · May 7, 2014, 3:23pm

Are you guys sure the XojoScript is even running?

With XojoScript I find that

for I as integer = 1 to 1000000000 next
doesn’t run at all.

I need to change it to

dim i as integer for I = 1 to 1000000000 next
and then it is actually slower than the code.

Markus_Winter · May 7, 2014, 3:30pm

P.S. You can also use the XojoScript example called “XojoScript” in the Examples -> Advanced -> XojoScript folder.

When you type in the “wrong code”

for I as integer = 1 to 1000000000 next
you get

[1:4-1:5] Undefined identifier.
[1:-1-1:-1] This local variable is unused.

Kem_Tekinay · May 7, 2014, 4:26pm

You’re right about the declaration, but add the pragmas and it returns, literally, zero microseconds. Even without the pragmas its faster than from within a Xojo method.

Note that using an IDE Script is not the same thing.

My code:

  #pragma BackgroundTasks False
  #pragma BoundsChecking False
  #pragma NilObjectChecking False
  
  dim startms as Double = Microseconds
  
  dim i as Integer
  for i = 1 to 1000000000
  next i
  
  dim totalms as Double = Microseconds - startms
  print Format( totalms, "#,0" )

The code that calls it:

  dim s as new MyScript
  if s.Precompile( XojoScript.OptimizationLevels.High ) then
    s.Run
  end if

Markus_Winter · May 7, 2014, 4:43pm

And that’s where you exit with an error

You never run the loop.

Markus_Winter · May 7, 2014, 4:44pm

P.S. The pragmas speed it up by a factor of about 10. Just remember that you need to have the pragmas in the XojoScript code, it is not enough to have them in the calling code.

Kem_Tekinay · May 7, 2014, 4:46pm

No, that syntax is legal in XojoScript, and I’m trapping for errors anyway. If it were exiting, I’d get nothing at all, and if I take out the pragmas, I get about 13 seconds.

But just to satisfy the argument, I changed my script code:

  #pragma BackgroundTasks False
  #pragma BoundsChecking False
  #pragma NilObjectChecking False
  
  dim startms, totalms as Double
  startms = Microseconds
  
  dim i as Integer
  for i = 1 to 1000000000
  next i
  
  totalms = Microseconds - startms
  print Format( totalms, "#,0" )

Happy to share my project if you’d like.

Markus_Winter · May 7, 2014, 4:47pm

Sorry, my bad. It doesn’t stop with an error but it is still taking about 3 seconds (MBP, 2.53 GHz i5, 8 GB)

Without the pragmas it is taking 34 seconds, code takes 28 seconds.

Kem_Tekinay · May 7, 2014, 4:51pm

I wonder what the difference is? I am getting, literally, zero. If I take out the pragmas, I get a little under 14 seconds.

Correction: I changed the format to use a decimal and now get a result. With the pragmas in place, it takes 0.058 microseconds. (That’s not a typo.)

Kem_Tekinay · May 7, 2014, 4:55pm

I just tested without calling Precompile, and the results are essentially the same as running it within the Xojo method. Could that be the difference?

Markus_Winter · May 7, 2014, 5:08pm

Would you mind sending me your project file? I#d like to have a tinker after dinner

Kem_Tekinay · May 7, 2014, 5:14pm

Here you go:

https://dl.dropboxusercontent.com/u/26920684/XojoScript%20Test.xojo_binary_project

Markus_Winter · May 7, 2014, 10:05pm

Thanks Kem. I ran a few tests on my MacBook Pro (Core i5 2,53 GHz, 8 GB RAM, SSD), both with a script and code:

Script:

#pragma BackgroundTasks False #pragma BoundsChecking False #pragma NilObjectChecking False dim startms, totalms as Double startms = Microseconds dim i as integer for i = 0 to 1000000000 next i totalms = Microseconds - startms print Str( totalms )

Code:

#pragma BackgroundTasks False #pragma BoundsChecking False #pragma NilObjectChecking False dim startms, totalms as Double startms = Microseconds dim i as integer for i = 0 to 1000000000 next i totalms = Microseconds - startms MsgBox Str( totalms )

IDE:
Script: 0.0500488
Code: 32.759,507

build:
Script: 0.0570068
Code: 3.015,140

Makes XojoScript seem fast, BUT:

Doing some actual work in the loop (using less loops):

Script:

#pragma BackgroundTasks False #pragma BoundsChecking False #pragma NilObjectChecking False dim startms, totalms as Double startms = Microseconds dim s as string s = "K1K2K3K4K5K6K7K8K9K10" dim i as integer for i = 0 to 1000000 s = ReplaceAll( s, "K", "K " ) s = ReplaceAll( s, "K ", "K" ) next i totalms = Microseconds - startms print Str( totalms )

Code:

#pragma BackgroundTasks False #pragma BoundsChecking False #pragma NilObjectChecking False dim startms, totalms as Double startms = Microseconds dim s as string s = "K1K2K3K4K5K6K7K8K9K10" dim i as integer for i = 0 to 1000000 s = ReplaceAll( s, "K", "K " ) s = ReplaceAll( s, "K ", "K" ) next i totalms = Microseconds - startms MsgBox Str( totalms )

I also tried it with the ReplaceAllB which I expected to be faster but surprisingly for me using ReplaceAllB is slower:

IDE:
Script: 6,166,837.796
Code: 6.502.094,354
ScriptB: 10.207.051,002
CodeB: 10.899.761,705

build:
Script: 6.292.978,685
Code: 6.351.852,240
ScriptB: 10.585.430,674
CodeB: 10.614.094,305

No difference between Script and Code.

Seems the reason it is so fast might be that the optimization is to completely delete the empty loop.