One thing to take into consideration is that the characters iterator handles Unicode characters differently to Middle which will naturally make it slower.
I have added 3 more versions:
Test 1: for i = 0 to u ; source.middle(i,1)
Fields=2000
Took 0.781 seconds
Test 2: for each ch in source.characters
Fields=2000
Took 33.857 seconds
Test 3: for i = 0 to u ; ch = MiddleBytes(i,1)
Fields=2000
Took 0.019 seconds
Test 4: Pointer
Fields=2000
Took 0.017 seconds
Test 5: while IndexOf<>-1
Fields=2000
Took 0.183 seconds
Test 6: while IndexOfBytes<>-1
Fields=2000
Took 0.002 seconds
What really surprised me is that IndexOfBytes seems to be even faster than using a pointer.
But the much more interesting part of the story is how the timings change with using more characters (fields, which are 10 characters with a following space character). I have tested the different versions with 1000, 2000, 3000 and 4000 fields.
Test1(source.middle): 0.201 sec (1000), 0.778 sec (2000), 1.746 sec (3000), 3.080 sec (4000)
Test2(for each): 8.613 sec (1000), 33.760 sec (2000), 75.594 sec (3000), 134.049 sec (4000)
Test3(MiddleBytes): 0.016 sec (1000), 0.030 seco (2000), 0.027 sec (3000), 0.048 sec (4000)
Test4(Pointer): 0.006 sec (1000), 0.010 sec (2000), 0.015 seco (3000), 0.030 sec (4000)
Test5(IndexOf): 0.060 sec (1000), 0.182 sec (2000), 0.421 sec (3000), 0.724 sec (4000)
Test6(IndexOfBytes): 0.001 sec (1000), 0.002 sec (2000), 0.003 sec (3000), 0.004 sec (4000)
So, there is a huge difference if testing such functionality with a small test sample compared to a big one!
Ok, there are some gigantic OS differences here:
- on macOS Intel and M1 : āfor each ch in string.charactersā is about 50x faster than String.Middle
- on Windows (x64): āfor each ch in string.charactersā is 34x SLOWER than String.Middle
Reported as <https://xojo.com/issue/65723>
65723 - String.Characters Iterator is 1000x slower on Windows than macOS
I donāt have a Linux system to test with, but @Rainer_Hofmann please add your comments to the feedback case if you can. Thanks!
Ok, I will verify this at home on my m1 mac and wil comment on the feedback case afterwards.
Here are my results from the M1:
Test 1: for i = 0 to u ; source.middle(i,1)
Fields=4000
Took 1.236 seconds
Test 2: for each ch in source.characters
Fields=4000
Took 0.071 seconds
Test 3: for i = 0 to u ; ch = MiddleBytes(i,1)
Fields=4000
Took 0.031 seconds
Test 4: Pointer
Fields=4000
Took 0.020 seconds
Test 5: while IndexOf<>-1
Fields=4000
Took 0.188 seconds
Test 6: while IndexOfBytes<>-1
Fields=4000
Took 0.002 seconds
You are right, there is a behavior difference compared to the Intel Linux version.
I may make an updated blog post.
CountFields is faster for me than the loop with MiddleBytes, but slower than MemoryBlock stuff.
So Iāll change my CountOccurrences to use CountFields instead.
Christian, CountFields is also a very good and very fast function. But only if you only want to count the characters. The other functions give you the positions of these characters as well.
Maybe, you want look at the project file which I have attached to [https://xojo.com/issue/65723](https://xojo.com/issue/65723)
It depends on what the goal is.
For me itās just a bit of benchmarking.
Christian, I also have to say that in my example the pointer function is not as fast as the IndexOfBytes function, on both platforms, Linux and Mac M1. Maybe I did something wrong because I am new to Xojo, but I doubt it.
Well, that test shows exponential timing. Which means that the iterator is badly implemented.
Ideally, it should remember the last pointer it accessed and then continue from there on the next iteration. But instead, it always starts from the beginning of the string. That means that the implementation of the iterator remembers just the index, and then calls internally a function that needs to find the character by index - which takes more time the longer the string (and index) gets. Had the iterator remembered the pointer into the string it goes over, then the time for increasing the string size would remain linear.
Yep, badly implemented. Not that Iād have expected anything else.
Time for Xojo to tweak things in the framework, I guess.
Thomas, ok, you described the problem well but of course we donāt know the exact implementation. Such unfavorable behavior of fundamental functions of a programming language should of course be fixed with high priority because string functions in particular, which process large amounts of data in loops, often occur and can then lead to very disappointing performance. Fixing such problems would probably also give the Xojo IDE a noticeably better running behavior.
However, other programming languages āāand developer environments also have their problems. At least I donāt know of any complex ones that donāt have bugs and sometimes annoying runtime behavior.
As the Xojo Community, we can help to point these out and also work together to improve the language. Xojo offers us so many unbelievably positive qualities that we should strive to promote further development in our favor. I know itās a hassle sometimes, but this way we can contribute something positive.
I could originally have blamed Xojo for my method not running fast enough. But thanks to the support of experienced programmers, I first learned a lot about Xojo and, secondly, got the opportunity to find out a fundamental cause of poor performance of string manipulations and was able to point this out to the Xojo developers. So if the Xojo IDE and our own Xojo developments run a little faster in the future, we can all say that we have contributed to something very useful in Xojo.
Rainer, especially if you need speed you should always know hiow to handle your data. I agree with Thomas, that not all is implemented in the best way in the Xojo framework and thereās room for optimizations in many places. But, itās not impossible to create fast data processing wirh Xojo; we have memory blocks and pointer, very fast array processing etc. Again, you need to know your data and your tools. And Xojo has these.
Carsten, of course you are absolutely right. But if fundamental functions show unexpectedly exponential runtime behavior, then it can happen that, even as a good software developer, you test with samples that are too small and afterwards the problems arise for the customer, because perhaps only the functionality was tested, but with too little data. I think it is essential that a programming language should be efficient and fast at its core. Of course, you can always do the rest yourself. The overall efficiency of a programming language naturally also results from how quickly and easily can be developed. If you have to continually write workarounds for standard functionalities or components, I donāt take that for granted.
Iām not talking about workarounds but about functionality that Xojo already has. And hereās my point: As a āgood software developerā you have to know about your data and you have stress tests too. If you know that your software should work with ābig dataā, you have to test this too, of course. I see no reason to blame the tool you use for bad implementation of functionality thatās not intended for a solution you need.
I was talking about core functionality everybody is using of a programming language. Of couse, if you know all different kind of data your customers are using you will test against it. Often you donāt know exactly all these data informations and the amount of data might change with the time as well. Then you should be able to trust core functionalities of your programing language. For example, I expect that the sqrt function is working with smaller double type numbers comparable fast as with bigger double type numbers.
Often Xojo is like a car: one size fits everyone. If you want a race car you have to implement it mostly yourself.
In my app you can select a number of files. No big deal you would think. However, when a user selected a large number of files the beachball was visible. FileListMBS was quite a bit faster. But even there I got the beachball. Now there is a parameter to give other apps a bit of processing time.
Unfortunately, I have seen that adding integer values to arrays has that weired exponential timing issue as well:
Test: integer array add
10.000.000, took 1.361 seconds
20.000.000, took 3.363 seconds
30.000.000, took 9.975 seconds
40.000.000, took 22.796 seconds
ā¦and doing the same with double values is faster but still exponential:
Test 2: double array add
10.000.000, took 1.351 seconds
20.000.000, took 3.341 seconds
30.000.000, took 5.979 seconds
40.000.000, took 9.500 seconds
There is a lot of performance improvement potential in Xojo, I guess.
Resize the array at the beginning if you donāt like it.
And read here: