Sort an indeterminate number of arrays together

I’m looking for a way of sorting an indeterminate number of arrays, by one or possibly 2 of them.

So I have a data structure that is an array of objects. Each of these objects has a Data property, which is also a single dimensional array of values. I don’t know at compile time how many columns there will be in the primary array. As code it would look like this:

Columns() as ColumnDefinition

Class ColumnDefinition
Property Data() as Value

Now I want to sort the whole Columns() array Data() members by one or more of the Columns elements.

If I knew how many columns there would be to start with I could use, for example with 5 columns:

Columns(1).Data.SortWith( Columns(2).Data, Columns(3).Data, Columns(4).Data, Columns(5).Data )

I can’t think how I could make that generic. I’m pretty sure it doesn’t take an array of arrays within SortWith.

I certainly don’t want to copy the data into other structures. Although I would be open to copying two columns together to enable the two level sort feature, I eventually would like.

I’ve rolled my own ShellSort but I’m not impressed with the performance. It could perhaps be helped by a faster data swap mechanism. Currently, I have:

Var Temp as Variant
Temp = Data(1)
Data(1) = Data(2)
Data(2) = Temp

You could make a loop over the Columns and do a SortWith only for Column(1) and the current column:

for currentColumn = 2 to Columns.count
     Columns(1).Data,SortWith(Columns(currentColumn).data)
next

That doesn’t work because on the second loop the Columns(1) array is already sorted. You would have to restore it’s order between loops. Which is perhaps an option.

For currentColumn = 2 to Columns.count
     Copy Columns(1).Data to Temp.
     Temp.Data.SortWith( Columns(currentColumn).data )
Next
Column(1).Data.SortWith( Columns(currentColumn).data )

Extending to two column sort:

For currentColumn = 3 to Columns.count
     Copy Columns(1).Data + Columns(2).Data to Temp.
     Temp.Data.SortWith( Columns(currentColumn).data )
Next
Copy Columns(1).Data + Columns(2).Data to Temp.
temp.Data.SortWith( Columns(1).data )
Copy Columns(1).Data + Columns(2).Data to Temp.
temp.Data.SortWith( Columns(2).data )

Yes, Copy Columns(1).Data + Columns(2).Data to Temp, would have to ensure that all Column(1).Data would not overlap with Column(2).Data (such as .ToString, with a fixed width).

Certainly worth investigating.

Hi @Ian_Kennedy , I would switch everything from arrays to an in-memory SQLite database. That solves all sorting problems and more.

Not really practical as there is no limit on the size of the array. I have datasets with 200,000 rows. I’m not sure an in memory SQL database is going to be fast to create, load, sort and unload.

Sorting isn’t the primary purpose of the application so switching the entire data configuration of the whole application at this point would be a little overkill.

Keep an unsorted copy of your data.

@Beatrix_Willius So I’ve come up with a solution based on your suggestion, as follows. Thanks for the suggestion.

Var Col As Integer
Var SortCache() As String, SortData() As String
SortData.ResizeTo ColDefs( 0 ).Data.LastIndex
SortCache.ResizeTo ColDefs( 0 ).Data.LastIndex
SortArrayBuild ColKey1, ColKey2, SortCache

For Col = Col1 To Col2
    SortArrayCopy SortCache, SortData
    SortData.SortWith( ColDefs( Col ).Data )
Next

SortArrayBuild creates a string of my data in an unusual way. It is required because my data is variants and SortWith doesn’t work with them. It also handles the ascending / descending capabilities and missing values that could be present.

SortArrayCopy simply duplicates the SortCache array into the SortData array.

1 Like

Actually chunking the sorting speeds things up even more. Sure the code is pretty ugly, but 16 seconds down to 6 is 2.6x faster:

Var Col As Integer
Var SortCache() As String, SortData() As String
SortData.ResizeTo ColDefs( 0 ).Data.LastIndex
SortCache.ResizeTo ColDefs( 0 ).Data.LastIndex
SortArrayBuild ColKey1, Assending1, ColKey2, Assending2, SortCache

For Col = Col1 To Col2 Step 10
  SortArrayCopy SortCache, SortData
  
  If Col + 10 <= Col2 Then
    SortData.SortWith( _
    ColDefs( Col ).Data, _
    ColDefs( Col + 1 ).Data, _
    ColDefs( Col + 2 ).Data, _
    ColDefs( Col + 3 ).Data, _
    ColDefs( Col + 4 ).Data, _
    ColDefs( Col + 5 ).Data, _
    ColDefs( Col + 6 ).Data, _
    ColDefs( Col + 7 ).Data, _
    ColDefs( Col + 8 ).Data, _
    ColDefs( Col + 9 ).Data )
    SGP.RunsCurrent = SGP.RunsCurrent + 10
    
  ElseIf Col + 9 <= Col2 Then
    SortData.SortWith( _
    ColDefs( Col ).Data, _
    ColDefs( Col + 1 ).Data, _
    ColDefs( Col + 2 ).Data, _
    ColDefs( Col + 3 ).Data, _
    ColDefs( Col + 4 ).Data, _
    ColDefs( Col + 5 ).Data, _
    ColDefs( Col + 6 ).Data, _
    ColDefs( Col + 7 ).Data, _
    ColDefs( Col + 8 ).Data )
    SGP.RunsCurrent = SGP.RunsCurrent + 9
    
  ElseIf Col + 8 <= Col2 Then
    SortData.SortWith( _
    ColDefs( Col ).Data, _
    ColDefs( Col + 1 ).Data, _
    ColDefs( Col + 2 ).Data, _
    ColDefs( Col + 3 ).Data, _
    ColDefs( Col + 4 ).Data, _
    ColDefs( Col + 5 ).Data, _
    ColDefs( Col + 6 ).Data, _
    ColDefs( Col + 7 ).Data)
    SGP.RunsCurrent = SGP.RunsCurrent + 8
    
  ElseIf Col + 7 <= Col2 Then
    SortData.SortWith( _
    ColDefs( Col ).Data, _
    ColDefs( Col + 1 ).Data, _
    ColDefs( Col + 2 ).Data, _
    ColDefs( Col + 3 ).Data, _
    ColDefs( Col + 4 ).Data, _
    ColDefs( Col + 5 ).Data, _
    ColDefs( Col + 6 ).Data)
    SGP.RunsCurrent = SGP.RunsCurrent + 7
    
  ElseIf Col + 5 <= Col2 Then
    SortData.SortWith( _
    ColDefs( Col ).Data, _
    ColDefs( Col + 1 ).Data, _
    ColDefs( Col + 2 ).Data, _
    ColDefs( Col + 3 ).Data, _
    ColDefs( Col + 4 ).Data, _
    ColDefs( Col + 5 ).Data)
    SGP.RunsCurrent = SGP.RunsCurrent + 6
    
  ElseIf Col + 4 <= Col2 Then
    SortData.SortWith( _
    ColDefs( Col ).Data, _
    ColDefs( Col + 1 ).Data, _
    ColDefs( Col + 2 ).Data, _
    ColDefs( Col + 3 ).Data, _
    ColDefs( Col + 4 ).Data)
    SGP.RunsCurrent = SGP.RunsCurrent + 5
    
  ElseIf Col + 3 <= Col2 Then
    SortData.SortWith( _
    ColDefs( Col ).Data, _
    ColDefs( Col + 1 ).Data, _
    ColDefs( Col + 2 ).Data, _
    ColDefs( Col + 3 ).Data)
    SGP.RunsCurrent = SGP.RunsCurrent + 4
    
  ElseIf Col + 2 <= Col2 Then
    SortData.SortWith( _
    ColDefs( Col ).Data, _
    ColDefs( Col + 1 ).Data, _
    ColDefs( Col + 2 ).Data)
    SGP.RunsCurrent = SGP.RunsCurrent + 3
    
  ElseIf Col + 1 <= Col2 Then
    SortData.SortWith( _
    ColDefs( Col ).Data, _
    ColDefs( Col + 1 ).Data)
    SGP.RunsCurrent = SGP.RunsCurrent + 2
    
  Else
    SortData.SortWith( ColDefs( Col ).Data )
    SGP.RunsCurrent = SGP.RunsCurrent + 1
    
  End If
Next

OK, this seems to be the answer to the general problem:

Var aKey() as String
Var aOrdinal() as Integer
'Copy Your original data to another place:
For nRow = 0 to LastIndex
   For Col = 0 to LastColumn
      aBackupCols( Col ).Data( aOrdinal( nRow ) ) = Cols( Col ).Data( nRow )
   Next
   ' Build your key and the original row numbers
   aKey.Add GenerateKey( Key1, Order1, Key2, Order2, Row )
   aOrdinal.Add nRow
Next

' Sort by the Key keeping the row numbers in the new order
aKey.SortWith( aOrdinal )

For nRow = 0 to LastIndex
   For Col = 0 to LastColumn
      ' Copy the data back into your primary location but using the new row order.
      Cols( Col ).Data( nRow ) = aBackupCols( Col ).Data( aOrdinal( nRow ) )
   Next
Next

In my case I was already copying the data into another location because I am building an undo buffer.

Timings, on a 184000 row dataset with 60 columns. My original shell sort was taking 70 seconds. This method is only taking 5.5 seconds. 12.7x faster.