So I’m one of the lucky ones who’ve managed to get an M1 MacBook Pro. Like most of the reviews, it’s great. I recompiled a few of my apps in 2020R2 for Apple Silicon and see up to 2x performance increase compared to my 2017 MacBook Pro.
My joy was short lived though as I think I’ve found an issue in the Xojo framework that is causing performance to be worse on ARM than Intel.
Put this code in the Open event of a new app:
Const iterations = 20000
Var a(iterations - 1) As String
Var start As Double = System.Microseconds
For i As Integer = 0 To iterations - 1
a.Insert(i, "a")
Next i
Var total As Integer = (System.Microseconds - start) / 1000
MessageBox("Took " + total.ToString + " ms")
If I run (or build, it doesn’t matter) this app on my M1 Mac as x86 it takes 90 ms. If I run or build it for ARM on macOS it takes 450 ms. If I build it as a universal binary it takes 90 ms (the same as the Intel build).
This doesn’t seem right. Why would this super simple app run slower on an M1 Mac when compiled for ARM than when running through Rosetta as x86?
Memory interactions are upto 5x faster on ARM (so I’ve read), so this should be faster too.
If you pre-allocate the array to the size of iterations, and simply set the valueAtIndex then this would be a lot faster on both Intel and ARM, however that might defeat the purpose of this test.
Const iterations = 20000
Var a(iterations - 1) As String
Var iMax As Integer = iterations - 1
Var start As Double = System.Microseconds
For i As Integer = 0 To iMax
a(i) = "a"
Next i
Var total As Integer = (System.Microseconds - start) / 1000
MessageBox("Took " + total.ToString + " ms")
OK @Martin_T, that reverses things. I upped the iterations by a factor of 10 (because it’s so fast) but this code:
Const iterations = 200000
Var a(iterations - 1) As String
Var iMax As Integer = iterations - 1
Var start As Double = System.Microseconds
For i As Integer = 0 To iMax
a(i) = "a"
Next i
Var total As Integer = (System.Microseconds - start) / 1000
MessageBox("Took " + total.ToString + " ms")
By the way: You are already aware that by using Array.Insert you increase your array defined to 200000 entries to 400000 entries, right? Is this what you want?
I got these result with 10000000 entries on MacBook Pro 2017. Thats amazing.
I actually have no interest in using arrays in this way in the code I’m writing. I’ve just written a GapBuffer class and was going to benchmark it against using an array. I wrote the above code quickly to get an idea how long it would take using (what I assumed would be) the slower array method and then test my new class.
It was in writing this quick code that I discovered a difference in performance between ARM and Intel and thought, huh - that’s weird.
It doesn’t detract from the fact that it should always be faster when running native than when running through Rosetta unless there is some bug in the framework or compiler I guess.
This bug gets weirder and weirder. There’s definitely something smelly going on with Array.Insert() on ARM.
If you run this code on both ARM and Intel:
Const iterations = 50000
Var iMax As Integer = iterations - 1
// Test inserting into an empty / small array.
Var a() As String
Var start1 As Double = System.Microseconds
For i As Integer = 0 To iMax
a.Insert(i, "a")
Next i
Var total1 As Integer = (System.Microseconds - start1) / 1000
Var message As String = "Inserting into small array: " + total1.ToString + " ms" + EndOfLine
// Test inserting into an array that has already been allocated space.
Var b(iterations - 1) As String
Var start2 As Double = System.Microseconds
For i As Integer = 0 To iMax
b.Insert(i, "a")
Next i
Var total2 As Integer = (System.Microseconds - start2) / 1000
message = message + "Inserting into large array: " + total2.ToString + " ms"
MessageBox(message)
It takes 550 ms for x86 and 2625 ms for ARM for array b. Array a is slightly faster than x86 (7 ms vs 10 ms).
I forgot, you have the OAK, there’s code in there (check the System Information window in the demo app) that will tell you what architecture your computer is using (as well as what architecture your computer is).
Good thought. Using the below code makes no difference
// Test inserting into an array that has already been allocated space using `AddAt`.
Var c(iterations - 1) As String
Var start3 As Double = System.Microseconds
For i As Integer = 0 To iMax
c.AddAt(i, "a")
Next i
Var total3 As Integer = (System.Microseconds - start3) / 1000
This may also be related to slower FolderItem calls (see other thread) on ARM.
Please be aware that for Intel there a lot more compiler optimisations available (SSE, SIMD, MMX,…)
ARM only has NEON (If I am not mistaken). Maybe Xojo is not compiling to make use of those ARM optimisations?
Interesting. I hadn’t considered that option. If that is the case then there’s little incentive to build a universal binary at the moment if Rosetta can translate more efficient x86 code faster than the M1 can run Xojo’s un-optimised ARM code.