ByRef and Function overloading - what is driving the xojo linker.

Mars_Saxman · October 11, 2016, 10:01pm

Hello. I wrote the compiler and designed the behavior you’re discussing. While the documentation of ByRef semantics has always left something to be desired, I assure you that the mechanism works the way it does on purpose, and is not likely ever to change, because changing it would cause significant problems.

The compiler rejects all but the simplest variable expressions as ByRef arguments in order to enforce a strong guarantee about object lifetimes. The error message could perhaps be improved, and the documentation of the guarantee could certainly be improved, but the guarantee itself is a critical piece of the language’s memory safety system.

The Xojo memory model manages allocation lifetime in terms of objects. The unit of allocation is the object instance; an object value is an abstraction which identifies a unique object instance. An object value is also called a reference, but do not confuse this sort of “reference” with the one created as a ByRef parameter; they are two semantically distinct applications of the same underlying mechanism.

Consider an ordinary ByVal parameter, like so:

Sub Foo(ByVal a As Integer) MsgBox "Hello, world. My number is " + Str(a) End Sub Foo(42)

In this deliberately trivial example, we invoke the method “foo”, providing the value 42 as an argument. When Foo begins to execute, it will accept this argument value, assigning it to the parameter variable named “a”. The variable “a” is local to the execution of the function Foo; it did not exist before Foo began to execute, and it will cease to exist when Foo returns. Foo can do whatever it likes with this variable, and may place any type-compatible value into it.

Something very different happens when we use ByRef instead:

Sub Bar(ByRef a As Integer) MsgBox "Hello, world My number is " + Str(a) a = 42 End Sub

When Bar begins to execute, it accepts an argument as Foo did: this argument is not a value, however, but a reference to some variable which already exists. Instead of creating a new local variable for the duration of its execution, as our previous example did, this function latches on to some variable which already exists, giving it a new (temporary) name. Just as a normal variable is a placeholder for some value which won’t be known until run time, a reference parameter is a placeholder for the identity of some variable which won’t be known until run time.

The argument you provide for a ByVal parameter is a value: some value which can initialize the parameter variable. The argument you provide for a ByRef parameter is entirely different: you must provide the identity of the variable itself, which you thereby grant the function you are calling permission to manipulate.

In order to verify that this call can be performed safely, you must be able to guarantee that the variable you are providing will continue to exist - no matter what happens - until the called function returns. Otherwise, the contract is broken. This is a high bar, since the function you are calling is empowered to do anything it likes to any object anywhere, indirectly.

As always, the compiler is trying to help you out, by checking your work and letting you know when you’ve made an assumption that cannot be supported by the evidence present in your code. When you provide the identity of a variable as a ByRef parameter, the compiler must be able to verify that there is no conceivable sequence of events the called function might experience such that the object to which that variable belongs would ever be released. That is, the variable absolutely must continue to exist until after the call returns: if the compiler can’t prove that’s going to be the case, it’ll take the safe course and refuse to let you make the call.

The compiler knows that any local variable must absolutely remain live while the called function executes, because of the nature of the stack. Any local variable which is in scope for the caller is therefore fair game. The compiler also knows that any global, static, or module-property variable will also remain live while the called function executes, because such variables have a lifetime equal to that of the program - those variables cannot be freed, so they are always safe.

Furthermore, the compiler knows that any instance variable associated with the current Self must remain live while the callee executes, because the current function has a reference to the Self object. No matter what the callee does, there will remain at least one reference to the Self object until after it returns.

This is the extent of the guarantee the language’s type system allows the compiler to provide. The aliasing problem prevents it from going any further. Since there may be more than one reference to any object at a time, and there are an arbitrary number of means by which the callee may be able to discover and manipulate the other objects in the system, the compiler must assume that absolutely anything could potentially happen while the callee is executing, and it therefore cannot make any safe assumptions about the lifetime of any object in the system beyond the ones I have described above.

“Improving” the compiler such that it would allow you to provide references to arbitrary object instance variables as reference parameters would be trivial, but then the compiler would no longer be able to assert that the code it produces definitely follows the language constraints which yield the memory safety guarantee. This is not a limitation in the capability of the compiler, it is a constraint arising from the nature of the language’s type system.

It is true that the compiler is being more conservative than might be strictly necessary. If we developed a more sophisticated graph analysis process, the compiler could discover additional circumstances where it could guarantee that the storage backing up some instance variable would remain unchanged until the call returned, and one could hypothesize that the language’s memory-safety guarantee would not require the compiler to prohibit such uses of ByRef. But how would you, the human programmer, make use of this capability? The compiler is precise but stupid; it can explore the graph and say yes or no, but sometimes it’s a lot of work trying to puzzle out how it got there. (I’ve had to spend a lot of time following its tracks and understanding how it came to the answers it did!) So, this would not be a good feature. We don’t want to have a language where the rule is “perform this extremely complex escape analysis in your head every time in order to determine whether or not you can safely use this feature”, especially not when the analysis is subject to change on practically every compile due to the complex web of interdependencies inevitably arising in software based on mutable object graphs.

Instead, we looked for the simplest line we could identify that would definitely exclude all the harmful outcomes and made that the rule.

Norman_P · October 11, 2016, 10:25pm

Thanks Mars

Massimo_Valle · October 12, 2016, 6:14am

Awesome explanation Mars!
As usual.

It’s nice to have here a former Xojo engineer, expecially when he is Mars.
Thanks.