Sorting long ListBoxes

(Windows 7, Xojo 2014r1.1)
Have you ever tried to sort a long ListBox using headers?
This (image bellow) is a 13 columns and almost 12000 rows listbox. It is big but not extraordinary.
As you can see the contents of all cells are integers or singles (no strings or other objects).
If I click on a header of a integers column it takes some seconds (about 10) to sort the table.
In December 2011 I filled up a feature request (#19552) asking for MouseCursor “Wait” while we are waiting for the list to be sorted. Because if you think that you have not pressed correctly the header and click it again, then the table is sorted twice and the order is not correct.
But it seams that no one else is affected by this delay and nothing has been done by Xojo about it.

Now the problem is much worse. If I click this table on a singles column header, it takes about 1 minute to sort. But if you do this after having sorted the table before, the most probable is that the program enters a loop and never ends. I have to use the task manager to stop the program.

Do you sort long listboxes usually? Do you use header clicking or do you do it by code?
Has any one else experimented this? Is this a bug?

Thanks for any feedback.
I have found no solution or workaround and I need to sort this table more than once.

Ramon Sastre

If I am going to use a ListBox that has a lot of data, I use my Data-On-Demand ListBox instead. It is much faster at displaying loads of data in all cases.

If your data is coming from a database, you can intercept the header click, rerun your query with Order By, then repopulate the ListBox. This is probably faster than using its sort.

Thanks Kem,

Unfortunately my data is not coming from a database.
I’ll have a look at Data-On-Demand ListBox. I’ve just seen the description and is looks promising.

It’s a bit different conceptually from the native ListBox. With the native, you keep your data within the ListBox and manipulate it there. With Data-On-Demand ListBox, it becomes a mere portal to your data, and all manipulation should be done by you. You do not “store” anything within the ListBox.

Thanks Kem. I understand it.
So it means that I will have to use Sort and SortWith or create my one functions to deal with my data, but I don think I’m better than Xojo engineers, so probably I will not create my own functions.

But anyway, my questions are:
Is this a bug?
Can Xojo sort a 12000 row ListBox several times by clicking headers without crashing?

No, unless you are doing something custom that requires it, SortWith is faster than anything you could come up with in native Xojo code.

I’ve seen the same ListBox behavior, btw.

Just to totally concur with Kem, truly optimizing a ListBox with some kind of OnDemand workings serves you best in the long run.

Just the fact that the way it is now, that you have to manage ListBox rows that are not seen on thousands/millions of rows, is hugely inefficient that it should make something scream. Managing the data behind the scenes yourself (whether its sorting or whatever), and then just showing the rows within view is so-so-so much faster and efficient.

Use Kem’s box if you can. I couldn’t because I was already committed to my own heavily-subclassed ListBox but I created my own implementation. (It’s not easy - devils in the details - but it’s very doable.)

Here’s a tip for sorting long lists that I stumbled on while working on my dataGrid… sorting a mostly sorted list can become horribly inefficient using certain methods (ie, what Xojo seems to use under the hood). To circumvent the slow down, use array.shuffle before sort or sortwith. You could even set a flag when you sort so that you know to shuffle before the next sort… I was seeing re-sort of a 200,000 element list of doubles (with one new random element at the end) taking 1 minute (60 seconds)… using shuffle first, the sort time for a 1 million element list is more like 1 or 2 seconds at the most (on a core 2 duo macbook!) and about 15 seconds for 1 million strings.

Have you tested that recently? Your results would suggest that they are using something like a QuickSort but without the middle-of-three optimization, and I didn’t think that was the case.

Well, I just tested it myself and Jim is absolutely right. For a million doubles, these were my results:

Already Sorted: 26,393 microsecs
Append 1 element: 283,184,522 microsecs
Append 1 element, shuffle: 188,363 microsecs

I guess a Feedback request is in order especially since Shuffle is not an option is you plan to use SortWith.

A minute is a lonnnng time. I’ve never had a listbox take more than a second or two to sort, even with 10,000+ rows. I’m wondering if there’s not something specific to your data that’s causing Xojo’s sort algorithm to bog down. I don’t know the implementation of Xojo’s listbox sort, but if, say, it uses quicksort, well, there are certain data sets that are pathological for quicksort and can make it really slow. Not sure there’s anything you can do about it, if you want to use the standard listbox.

I will note that you can prevent users from clicking the headers twice by either intercepting the headerpressed event or placing a transparent canvas over the headers (that’s the way we used to handle it back in the day – maybe the headerpressed event has obviated that?), combined with setting and clearing a boolean that tells you whether you’re still executing the first sort.

Also, the standard listbox sort is string-based — if you want to sort numerically you’ll have to implement the CompareRows (?) event.

See

<https://xojo.com/issue/33905>

Walter,
Thanks for your advice.
Yes I use CompareRow events, since all my data are numeric. And perhaps that’s the reason of the delay, since comparing doubles in this event maybe takes a lot of time. It’s just a guess.
I assume I will have to move all the table to a set of arrays (integers or doubles, one per column) and the use SortWith to rearrange the table once sorted. And I will have to deal with ascending or descending order myself.

Unfortunately there is no Array.ShuffleWith method.

You can emulate ShuffleWith.

// arr1 and arr2 need to be shuffled together and both have the same number of elements
dim indexArr() as integer
redim indexArr( arr1.Ubound )
for i as integer = 0 to indexArr.Ubound
  indexArr( i ) = i
next i
indexArr.Shuffle
indexArr.SortWith arr1, arr2

Yes. You’re right. In fact I only need to shuffle one array!
Thanks.

Curiously in the SortWidth Language Reference page, there is an example how to sort a table using SortWidth.
Does it mean that it would take the same time to sort as if I click on a header?

I have a filtered listbox that used to display similar behavior - very sluggish redraws on sorting and filtering. I ended up wrapping the listbox population routines in the following code, and it significantly sped up the drawing.

  myListbox.ScrollBarVertical = false

  #pragma BackgroundTasks false
  #pragma BoundsChecking false
  #pragma NilObjectChecking false
  #pragma StackOverflowChecking false

<insert your loading/refresh/sort routine here>  

  #pragma BackgroundTasks true
  #pragma BoundsChecking true
  #pragma NilObjectChecking true
  #pragma StackOverflowChecking true
  
  myListbox.ScrollBarVertical = true

[quote=97833:@Ramon SASTRE]Walter,
Thanks for your advice.
Yes I use CompareRow events, since all my data are numeric. And perhaps that’s the reason of the delay, since comparing doubles in this event maybe takes a lot of time. It’s just a guess.
I assume I will have to move all the table to a set of arrays (integers or doubles, one per column) and the use SortWith to rearrange the table once sorted. And I will have to deal with ascending or descending order myself.
Unfortunately there is no Array.ShuffleWith method.[/quote]

I’m afraid SortWith probably won’t help you since it only works if all the values in the base array are unique. If you can guarantee that the column in the listbox you want to sort has no duplicate values, then SortWIth will work, otherwise you’re out of luck.

One possible approach might be to store each row of data in an object, or since all your values are numeric you could probably use a Structure, then implement your own sorting algorithm to sort those objects/structures on whatever column (i.e., object property or structure field) the user wants to sort, DeleteAllRows, reinsert all rows in their new sorted order.

I don’t know how fast that would be. You could probably use quicksort, but even so it won’t be nearly as fast as Xojo’s built-in sorting. I implemented a shellsort a few weeks ago and it was more than order of magnitude slower than Xojo’s array sort on random data (but 50% faster on data that was already 99% sorted, which is the use case I cared about).

But I can’t imagine it would take more than a second in total, for 12,000 rows. A small fraction of a second to sort the objects/structures (even my dog-slow shellsort would take less than a 10th of a second, I’m sure), and a larger fraction of a second to repopulate the listbox.

If you’d like to look at it, I just put together a shuffleSortWith method as a workaround. My tests show it to be 10x slower than Xojo sortWith on unsorted(random) data and up to 100x faster on mostly sorted data…

It is called like :

shuffleSortWith ( myarray(), another1(), another2(), another3()... )

[code] Sub ShuffleSortWith(a() as double, paramarray b() As variant)

#pragma DisableBoundsChecking

dim indexes() as integer
dim indexes2() As integer

redim indexes(a.Ubound)
ReDim indexes2(a.Ubound)

for i as integer=0 to a.Ubound
indexes(i)=i
next

indexes.Shuffle //shuffle the indexes

for bIndex as integer=0 to b.Ubound //each array to sort
dim tmp() As Double
tmp()=b(bIndex) //cast it to double
for i as integer=0 to a.Ubound
indexes2(i)=indexes(i) //copy the shuffled indexes
next
indexes2.SortWith tmp //shuffle the array same order as a()
b(bIndex)=tmp //put it back for now
next

indexes.SortWith a //shuffle a()
indexes.sort //reset the indexes
a.SortWith indexes //create a cross reference for sorting the others

for bIndex as integer=0 to b.Ubound
dim tmp() As Double
tmp()=b(bIndex) //cast it to double
for i as integer=0 to indexes.Ubound
indexes2(indexes(i))=i //copy the cross reference array
next
indexes2.SortWith tmp //sort the temp array by the cross refrence copy
b(bIndex)=tmp //put the sorted array back in variant array

next

End Sub
[/code]

I tested with 4 arrays of 200,000 element BTW…

Ramon,

I put together a demo project to test the “keep your row data in objects/structures, sort it yourself, then repopulate the listbox” approach. It seems to work fairly well, i.e., < half a second for 12,000 rows, on my machine. It’s kind of a pain in that you have to implement sorting for every column, but half a second is better than a minute, amirite.

You can download the project here: Project file on dropbox.
Or, zipped file if that doesn’t work.

Should be self-explanatory, but if not, ask me anything.