Code optimization question

William_James · September 15, 2015, 4:53am

I am in the process of optimizing an arbitrary precision integer class in Xojo and have determined that vast vast majority of the time was being spent creating and destroying class objects. I am refactoring to code to maximize the reuse of class objects and memory blocks and have already cut the processing time by more than 50%. But in the process of doing this I have run into something I can’t explain and maybe someone with a better understanding of Xojo internals can provide some insight.

An arbitrary precision object in my class has three properties:

Name Type Meaning

sign Int64 sign of integer (-1 for negative, 0 for zero, 1 for positive)
digits mutablememoryblock holds the bits of the integer, concatenation of unsigned 64 bit words
used Int64 how many 64 bits words in memoryblock is currently in use

The default constructor sets the sign value and used value to zero and does not create the memoryblock (i.e. a zero value).

Now in my pursuit of maximizing the reuse of objects I am trying to create a fast copy method when two objects already exist. In other words copy objA to objB. So I thought that the best way to do this would be to do the following which overwrites the second object:

objB.sign = objA.sign
objB.used = objA.used
if objA.sign <> 0 then
objB.digits = New xojo.core.mutablememoryblock(objA.digits)
end if

But using a copy like the above slowed the application to an absolute crawl. The normal copy method does the following which creates a new object:

objB = new object ’ call default constructor that sets sign=0 and used=0
objB.sign = objA.sign
objB.used = objA.used
if objA.sign <> 0 then
objB.digits = New xojo.core.mutablememoryblock(objA.digits)
end if

What I am scratching my head about is why is it much faster to create a new object and copy the properties to it than it is to replace the properties of an already existing object. I have not done specific timing, but it looks like the difference is a couple of orders of magnitude or more! My only explanation is that it must be much faster to create a new object and new memoryblock that it is to replace a memoryblock of an existing object. But why would that be the case??

Sam_Rowlands · September 15, 2015, 5:01am

I’m not sure that I fully understand everything you said, but this last section makes some sense.

If there is no memoryblock, Xojo only needs to create one. However if one already exists, it must be destroyed first (memory deallocated), before the new one can be set as a value of your object (which then requires re-allocation of memory).

You might gain some speed, but actually copying the contents of the memoryblock instead of the memoryblock if one already exists (providing the length is the same of course).

William_James · September 15, 2015, 5:09am

Sam, in almost all of the method calls in the test case both memory blocks existed. So in the second case where a new object is created, the old memory block has to be destroyed as well (part of the whole object). So in one case an object including its memory block is destroyed and replaced and the other case the memoryblock is destroyed and replaced but not the object itself. It just seems counterintuitive that it takes more time to replace the parts versus the whole.

Sam_Rowlands · September 15, 2015, 5:13am

Oh I see, and obviously misunderstood… In this case then, it does seem odd. Hopefully someone from Xojo can explain why this is.

The other thing I can think of is to file a feedback report with a clear demonstration of the difference. So that Xojo’s engineers can clearly see what’s going on underneath and help to provide an improvement.

Eli_Ott · September 15, 2015, 5:58am

Possibly faster approaches:

Use a MemoryBlock instead of MutableMemoryblock and make the class immutable. You would need to make default constructor private and create a public constructor with sign, etc. as arguments. In my experience MemoryBlock is much faster than MutableMemoryblock.
Create a MemoryBlock subclass and use the first two IntegerValue places for sign and used, the rest for digits. You would need to override the default constructor and set it private and create a public constructor with sign, etc. as arguments.
Use a structure. In my experience structures are much faster than classes.

Joe_Ranieri · September 15, 2015, 6:24am

Could you post the code somewhere that I could take a quick peek?

Beatrix_Willius · September 15, 2015, 7:53am

The less copying you do the better. I recently sped up my main parsing algorithm by using a global module instead of doing nice OO with copying large amounts of data.

Kem_Tekinay · September 15, 2015, 2:22pm

I like Eli’s idea of making it immutable, but I guess it depends on what you want to do with it. If the class is immutable, it would act like an integer and you would only have to worry about creating a new MemoryBlock when you’re doing some operation to create another object. To create a “copy”, you would only need to assign the object stored in one variable to another.

William_James · September 15, 2015, 2:48pm

Well, I need to apologize again. When creating a condensed test case for Joe, the problem did not remain and after a lot of digging I found a subtle bug in the method I was using to test the copy method in the original app where I was stepping on one of the method inputs. After fixing the bug I determined that the copies that just replace the memoryblock versus creating a new object are indeed 2x faster.

A lesson I have learned is that reusing objects can create an enormous improvement in performance but the code can get to be much more complex, leading to hard to find bugs.

But I feel that the changes were worth it for me. The benchmark code that I am testing with is the factorization of 1524157875352385430318221 using the elliptic code factorization method. When I started optimization, the factorization took 87 seconds, now I am at 20 seconds. For the longest, I was trying to improve the math algorithms and making absolutely no progress. After I realized that the lions share of the time was being spent creating and destroying objects I was able to improve things considerably.

There were tradeoffs that had to be made however. The original code was designed to allow the creation of math expressions like you would make in Xojo for integer and doubles, such as:

y = m * x + b

using operator overloading.

The issue with this design approach is that I created an object for m * x, and then one for m * x + b and if this was in a loop it would result in many many objects being created. In my test case, I was performing over 4 million divides/remainders, additions and subtractions. That’s creating and destroying over 16 million objects. The object creation/destruction was swamping the time actually spent in calculations.

I added additional methods that would reuse objects so that the code would look more like:

multiply(m, x, t1)
add(t1, b, y)

where t1 and y objects are reused if they already exist. This is the approach used by the open source GMP library and I am sure they went that way for performance reasons. But if you look at the GMP code the code is very difficult to follow and that is something that I strived to avoid.

Hopes this helps others in their optimization quest. And thanks to everyone that posted help. It is always appreciated!

Charles_Yeomans · September 15, 2015, 5:12pm

[quote=213630:@William James]I am in the process of optimizing an arbitrary precision integer class in Xojo and have determined that vast vast majority of the time was being spent creating and destroying class objects. I am refactoring to code to maximize the reuse of class objects and memory blocks and have already cut the processing time by more than 50%. But in the process of doing this I have run into something I can’t explain and maybe someone with a better understanding of Xojo internals can provide some insight.

An arbitrary precision object in my class has three properties:

Name Type Meaning

sign Int64 sign of integer (-1 for negative, 0 for zero, 1 for positive)
digits mutablememoryblock holds the bits of the integer, concatenation of unsigned 64 bit words
used Int64 how many 64 bits words in memoryblock is currently in use

The default constructor sets the sign value and used value to zero and does not create the memoryblock (i.e. a zero value).

Now in my pursuit of maximizing the reuse of objects I am trying to create a fast copy method when two objects already exist. In other words copy objA to objB. So I thought that the best way to do this would be to do the following which overwrites the second object:
[/quote]

A much faster approach would be to declare and use the C function memcpy to copy the contents of objA to objB.

William_James · September 15, 2015, 5:28pm

For larger memory blocks, I’m sure that using memcpy would be faster. I am currently copying the used portion of the memoryblock using pointers in a loop in Xojo. For the particular test case that I am benchmarking the memory blocks are typically very small (16 bytes) so the overhead of the declare would not be worth it in this case.

Charles_Yeomans · September 15, 2015, 5:40pm

I’ve done a lot of this sort of coding in xojo. You might actually test and see.