object arrays and memory leaks

Aaron_Hunt · May 26, 2014, 1:11am

If I have an array of objects called A(), and then I create another array B() cast as A’s object type, consisting only of references to A(), and then I set A() = B(), have I created a memory leak?

The reason I’m doing this is: I have an array of objects that I need to sort by more than one parameter. I can use Xojo’s Array.SortWith for an initial sort, but then I need to sort those results again in smaller chunks according to the duplicate values that appeared in the initial sortwith, according to other parameters. I can imagine how to do it using only Array.SortWith by creating an elaborate sorting integer array that is derived from all the parameters I need to sort with, but the above method is easier, and I’m wondering if I can use the easier approach without causing a problem. Anyone know?

Thanks!

Tim_Hare · May 26, 2014, 1:16am

Memory leak? No. What you have created is 2 references to the same array. If you manipulate A(), you change B() as well.

Did you intend to get a copy of B()? Or are you really after a second reference to B()?

Tim_Hare · May 26, 2014, 1:19am

What I would do is create a string array based on the sort and sub-sort criteria and use SortWith on that. It’s usually easier to create a string representation of a multilevel sort than try and create a number that represents the sort criteria.

Aaron_Hunt · May 26, 2014, 1:26am

[quote=91645:@Tim Hare]Memory leak? No. What you have created is 2 references to the same array. If you manipulate A(), you change B() as well.

Did you intend to get a copy of B()? Or are you really after a second reference to B()?[/quote]

It’s the 2 references to the same array that I’m worried about, because I’ve set the initial array equal to references of itself, but I’m unclear on whether this is a circular reference or not in terms of reference counting. As I understand memory leaks, they happen when references can’t be destroyed due to circular referencing, so that’s why I’m concerned.

I put the sort in a method, so I can call:

A() = SuperSort( A() )

The function SuperSort creates and return the B() array which is just a reordering of A() created using .append, based on some sorting criteria.

Aaron_Hunt · May 26, 2014, 1:32am

The parameters are all numerical. Sorting as strings would require some kind of encoding. I can think of a lot of ways to do it, but is there some standard way of doing it?

Aaron_Hunt · May 26, 2014, 1:43am

What I’d really like is a SortWith that allowed using arrays of larger sizes, including a param for where to begin the sort, so that

aSmallerArray.SortWith( aLargerArray, 8 )

would mean: sort aLargerArray only elements 8 through 8+ aSmallerArray.Ubound, and if the sort goes past the bounds of aLargerArray, that part is just ignored.

I think I could write such a method, but I’d like to know about this reference counting issue.

Wayne_Golding · May 26, 2014, 1:46am

This may be totally off, but have you considered using SQL? An inmemory SQLite database is easy to create, write the data to a table & use the order by clause to return a recordset sorted on multiple columns.

Joe_Ranieri · May 26, 2014, 1:47am

No. The original array that A referred to is destroyed and both A and B are referencing the same array.

Aaron_Hunt · May 26, 2014, 1:48am

Thank you kindly, Joe Ranieri.

Aaron_Hunt · May 26, 2014, 1:52am

Not off at all - I had considered it, but it seemed kind of overkill. If I were writing a PHP app that would naturally be how the objects existed anyway, so it does make sense.

Aaron_Hunt · May 26, 2014, 2:00am

What’s like magic here, is that any references to A() existing previous to setting A() = B() will still work.

That is, A() may be destroyed, but since B() is created from references of A(), anything referencing A() previously still references the correct object, just reordered… which is exactly what I’m after.

So, say I have R() is an array with 3 elements from A() of some larger size, say 10. I do A() = SuperSort( A() ). A() is still 10 elements, just reorderd, and R() still references the correct elements of A().

I figured I must be paying a price (via memory leakage) for this magic, but I guess not

Joe_Ranieri · May 26, 2014, 2:14am

[quote=91661:@Aaron Hunt]What’s like magic here, is that any references to A() existing previous to setting A() = B() will still work.

That is, A() may be destroyed, but since B() is created from references of[/quote]

Arrays are reference counted, which means the runtime keeps track of how many things are referencing any given array. When an array’s reference count hits 0, the array is destroyed. So when I said that ‘the original array that A referred to is destroyed’, I was assuming that A was the only thing referencing that original array.

Aaron_Hunt · May 26, 2014, 2:22am

Aha. So if I have other references like R() to elements of A(), previous to setting A() = B(), ceteris paribus, then I DO have a memory leak?

P.S. Sorry to leave out such a crucial detail.

Aaron_Hunt · May 26, 2014, 2:39am

What confuses me about this is that B() is created entirely as references to A(). So, it is just like R() except that it references all items in A() whereas R() references only some. Other arrays reference A() as well, pretend we have P() and Q() each referring to some elements in A().

If I then set A() = B(), I still have all the elements of A(), and R() P() and Q() all still work, so I guess A() isn’t being destroyed. But B() is just made up of references to A()…

It seems like doing this many times would just result in more and more references to A(), with nothing ever getting destroyed. If it’s not exactly a memory leak technically, it sure seems like proliferation of references that can’t be got rid of because setting A() = B() never destroys A. Is that right?

Tim_Hare · May 26, 2014, 2:46am

Firstly, R() doesn’t reference A(). It references the same objects that A() references. It’s not R -> A -> Object, it’s R -> Object and A -> Object.

Secondly, when you set A() = B(), A() IS destroyed and is replaced with a new array B(). That new array just happens to reference all the same objects that A() originally referenced, so it looks like A() is still intact, even though you have completely recreated it.

And none of this constitutes a memory leak.

Aaron_Hunt · May 26, 2014, 2:55am

Right.

[quote=91668:@Tim Hare]Secondly, when you set A() = B(), A() IS destroyed and is replaced with a new array B(). That new array just happens to reference all the same objects that A() originally referenced, so it looks like A() is still intact, even though you have completely recreated it.

And none of this constitutes a memory leak.[/quote]

Okay, sorry you answered the question to begin with and I’ve just been confusing myself. So to clarify …

I have no other arrays that have been created by someArray() = A(). I only have other arrays that reference objects that exist in A().

So A() = B() destroys object A and replaces it with object B. B references objects that A referenced, so any other references to those objects will persist.

Key here is that the array A() is itself an object. And it gets replaced with the array object B().

I think I got it, finally. Sorry to be dense

Garth_Hjelte · May 26, 2014, 1:41pm

That’s true. However, I think it’s confusing (to me at least) to think of it that way. The C++ experience I have helps with this.

When in BASIC we do ‘Dim i As Integer’, we are doing two things: we establish a token for an integer called ‘i’, and internally there’s memory allocated to it. We are pretty careless with this memory, but courtesy of Rainieri the token goes out of scope and the memory deallocated perfectly.

When we establish an array ‘Dim MyObj() as CDevil’ and ‘ReDim MyObj(665)’ you are just creating a container to hold 666 devils. If you immediately call MyObj(7).GoToHell, you get an exception because no object has been created for that index. All you’ve done is set up the container, not the objects. An array is just a container. All individual objects in that array have their own lives and own memory.

What you are doing, like Joe said, is making two references to the same array. Given my evil example above, and pretending you’ve created 666 Devils, you have 1332 objects/tokens, but only 666 discreet ones, and only 1 discreet array, not 2. So setting A(77).HasPitchFork = True also makes B’s 77th indexed object have a pitchfork too. And… A.Remove(144) also removes a devil from B().

BASIC is nice because it cleans up memory for you when the variable (the token) goes out of scope. That’s the “magic” of reference counting. And ultimately when your app quits, all memory is cleaned up because everything goes out of scope. (Yes, I’m oversimplifying.) So when you say “memory leak”, you are mostly talking about during app existence, not losing memory ‘forever’.

In your example, if A() goes out of scope and B() doesn’t, the 666 pieces of memory still exist. When B() goes out of scope, all reference counts will go to 0 and memory will be cleaned up. And obviously the miniscule memory the array container itself takes up will be deallocated.

So there’s really no ‘magic’, to diagnose your situation, simply try to figure out if there are any references to the original object still in scope. This is a good thing bcause it makes your programming much more intentional, instead or willy-nilly flying objects around, just because you can, because BASIC is counting references for you. As you know, it’s easy to start making copies and you can create a “leak” - it’s not really a leak - by having some globally-scoped array still containing valid objects and you aren’t using them.

BTW, C++ trains you well in this because it is not reference counted. If you make an array, and ‘new()’ all the objects, and start make copies of of those objects, and you ‘delete()’ those copies, you’ve deallocated the memory to the originals and other copies and you crash and burn when you access those objects that are still in scope but have their memory deallocated. Before you think C++ is lousy, consider that it forces you to be disciplined and be aware of where your objects exist and where they don’t, and WHO is the legal entity allowed to deallocate those objects. BASIC does the work for you, but don’t let that make you become irresponsible where your own app becomes internally chaotic. You start using objects that are actually some other object, and you go - “what was that???”

I’m also skipping the concept of circular references, but again it’s the same solution. Know where your objects are and intentionally know when they are/should go out fo scope and be deallocated.

Sorry to “go basic” here, but just having a good grasp of memory allocation, even though BASIC is handling it for you, clears the matter up and makes the ‘magic’ go away.

Aaron_Hunt · May 26, 2014, 8:31pm

Thanks for the very clear and helpful tutorial / refresher course, Garth. I’m familiar with C++, though I have admittedly not done much with it. I’ve followed tutorials, read and done simple things in it (mostly using the book C++ In a Nutshell - which is really more of a reference text - and often when I need to transfer some code into RS/Xojo). Clearly, I get confused sometimes, as above - notice how I kept saying “references to A()” when I should have said “references to elements in A()” as Tim Hare pointed out - I was making myself confused by the language I was using … not being precise in my language = not being precise in my thoughts (philosophy teaches us the same thing). It’s a forehead-slapper, really.

Automatic scope-related memory releasing makes me nervous occasionally, mostly when I am not thinking clearly (my own fault, lack of sleep, bad diet, or what have you as an excuse, it’s just not thinking clearly!) Only in the past couple of years have I really understood how and when to use WeakRefs to avoid circular references (and real memory leaks) in Xojo. What I was referring to above as a memory leak was, as you said, memory allocated and not used in my app, due to objects persisting when I expected them to be destroyed. Of course, we want to be as efficient as possible with memory usage. Nice that C++ gives you the control, but sometimes that would be rather a burden. Nice that Xojo does it for us, but also gives us ways to take care of cleaning things up when we want to : )

Thank you again!

Tim_Hare · May 26, 2014, 8:55pm

If I may be nit-picky:

[quote=91824:@Garth Hjelte]When in BASIC we do ‘Dim i As Integer’, we are doing two things: we establish a token for an integer called ‘i’, and internally there’s memory allocated to it. We are pretty careless with this memory, but courtesy of Rainieri the token goes out of scope and the memory deallocated perfectly.
[/quote]
When you Dim a local variable, it acts exactly the same as in C++. We are pretty careless with this memory in both languages for the same reason: the memory is allocated in the stack frame and is freed at the end of the subroutine call when that frame is popped off the stack to be reused in the next subroutine call. This is not unique to Xojo and has nothing to do with reference counting.

[quote]When we establish an array ‘Dim MyObj() as CDevil’ and ‘ReDim MyObj(665)’ you are just creating a container to hold 666 devils.
[/quote]
How is this any different than what Aaron said:

[quote]the array A() is itself an object.
[/quote]
You’re just using “container” instead of “object”.

When we establish an array ‘Dim MyObj() as CDevil’. You’re doing 2 things. First, memory is allocated on the stack frame for a local variable that contains an Array Reference. Second, an array object/container/block is created. In this case it is initially empty. It exists (A is not Nil - it contains a valid reference), but does not contain any CDevil references (its UBound is -1).

When we ‘ReDim MyObj(665)’ memory is allocated in the array for 666 CDevil references. As you correctly point out, all of those references are Nil. It is now valid to refer to MyObj(7) (previously you would have had an OutOfBounds exception), but you cannot yet refer to any properties or methods of MyObj(7) without getting a NilObject exception.

[quote]What you are doing, like Joe said, is making two references to the same array. Given my evil example above, and pretending you’ve created 666 Devils, you have 1332 objects/tokens, but only 666 discreet ones, and only 1 discreet array, not 2.
[/quote]
In your example, you don’t have 1332 “objects/tokens”, you only have 666. Given A = B, you have 2 array references, 1 array object, containing 666 CDevil references, and 666 CDevil objects.

In Aaron’s case, he did have 2 distinct array objects. The original array A() and a brand new array B() into which he copied the references of A() and then sorted. So he did have 1332 CDevil references and 666 CDevil objects, each with a reference count of 2. When he sets A = B, the array referred to by A is destroyed along with 666 CDevil references and the reference count of each CDevil object falls to 1.

[quote]As you know, it’s easy to start making copies and you can create a “leak” - it’s not really a leak - by having some globally-scoped array still containing valid objects and you aren’t using them.
[/quote]
That’s the important thing to note: “it’s not really a leak”.

[quote]I’m also skipping the concept of circular references,
[/quote]
This is where a memory leak occurs. When you have references that won’t normally go away. F refers to G which refers to F. They may both go out of scope, but neither will be destroyed. That results in memory that is in use, but is not accessible from anywhere in the program, so your app cannot reuse it. Instead, it must continually allocate new memory. That is the “leak”.