Removing duplicates from a DesktopListBox

Perry_Paolantonio · August 27, 2022, 3:44pm

I have a DesktopListbox that has names (column 0) and email addresses (column 1). I’m trying to remove the duplicate entries by stepping through the list in a for/next loop and using .RemoveRowAt to get rid of duplicates. In cases where the name field is empty in one duplicate entry but not in the other, I want to delete the empty one, so we have a name.

When I put breaks into my code and run it in the debugger, it’s evaluating correctly (for example, when the name on the previous email is empty but the current name is not, it’s finding that and breaking in that bock. However, I don’t seem to be able to remove the row. The resulting listbox is showing all the duplicates. I haven’t done too much with Listboxes, so maybe I’m not getting how they work, but this all seems to be working except for the part where I delete the rows! Any ideas?

// Sort all of the entries based on the email address. 
ParsedAddressList.ColumnSortDirectionAt(1) = DesktopListBox.SortDirections.Ascending
ParsedAddressList.SortingColumn = 1 
ParsedAddressList.Sort


// Walk through the listBox to remove duplicates
For row As Integer = 0 to ParsedAddressList.RowCount-1
  if row > 0 then //avoid OutOfBoundsException
    
    //previous row's email addresses and names
    var prevName as string = ParsedAddressList.CellTextAt(row-1,0)
    var currName as string = ParsedAddressList.CellTextAt(row,0)
    var prevEmail as string = ParsedAddressList.CellTextAt(row-1,1)
    var currEmail as string = ParsedAddressList.CellTextAt(row,1)
    
    if prevEmail = currEmail then  
      //We would prefer both name and email if possible. In some cases, we may have duplicate email
      //addresses, but no name. So, we want to delete the empty name row and keep the one with a name
      
      if prevName <> currName AND prevName = "" then
        //previous name is empty, so delete that row
        ParsedAddressList.RemoveRowAt(row -1)
        break  //this is breaking when the current name is there but previous is not

      elseif prevName <> currName AND currName = "" then
        //previous has a name, but current does not, delete current row
        ParsedAddressList.RemoveRowAt(row)
        break  //this is breaking when the previous name is there but current is not
        
      elseif prevName = currName then
        //these are a match, so delete the current row
        ParsedAddressList.RemoveRowAt(row)
        break //this is breaking when the names are equal
      end if
      
    end if
  end if
Next

//re-sort the listbox (maybe not necessary?)
//and refresh it

ParsedAddressList.ColumnSortDirectionAt(1) = DesktopListBox.SortDirections.Ascending
ParsedAddressList.SortingColumn = 1 
ParsedAddressList.Sort

ParsedAddressList.Refresh

What I get is a sorted list, with all the duplicates still there

Ian_Kennedy · August 27, 2022, 4:27pm

Rows start at 0, which is fine. However, when you remove a row all the rows above renumber down by 1. I would loop backwards to avoid that issue.

Rick_Araujo · August 27, 2022, 4:27pm

I see you also want to sort that data. Wouldn’t be better dump such list into a sqlite (even in memory) discarding duplicated keys (emails or whatever), clear the listbox and reload it with such data ordered using a SELECT ORDER as you could wish?

Perry_Paolantonio · August 27, 2022, 4:30pm

Thank you! reversing the direction worked!

Ian_Kennedy · August 27, 2022, 4:34pm

As Rick has said, if you can avoid loading them into the List in the first place, it will be much faster. If you are loading them from a database in the first place use an Order by on your select statement and then compare the one you are about to add with the previous one and don’t load it.

If it is in a pair of arrays to start with use:

aEmails.SortWith( aNames )

and then test them with the previous one. In either case using a non-visual holder for your data will be a lot faster. Each time you update the listbox it has the potential to repaint its self causing excessive CPU usage.

Rick_Araujo · August 27, 2022, 4:35pm

^ This

Perry_Paolantonio · August 27, 2022, 4:48pm

Sure - but this is a quick and dirty app and it processes the list basically instantly. The app is just for internal use. If it was doing more heavy lifting, I would use a database but this is something we’ll fire up once every few months at most, and it’s not processing all that much. I just need something that works here, and don’t care about speed in this case!

Emile_Schwarz · August 28, 2022, 6:53am

Or…

Redesign the way you do this…

If you store the data in a SQLite Data Base, you may set Fields (Name, eMail Addresses) as Unique, and so, you will avoid the Duplicate removal at all…