Delete Duplicate Rows in a ListBox

I want to implement a “Delete duplicate Rows in a ListBox”.

I am unsure of what to do.

Here’s the pseudo code I have in mind:

Do a global scan in a main loop
a. Store one Row Contents in RowSource --> LB.Cell(LoopIdx,-1)
That loop uses DownTo 0

b. Add an inside loop (sub loop) to scan the whole ListBox
Compare RowSource contents to LB.Cell(LoopIdx - 1,-1)
If I found a duplicate, delete it
and remove 1 to LoopIdx

What is your feeling ?

<some minutes ellapsed and…>

Here’s the code I actually wrote:

[code] Dim RowCnt As Integer // Number of Rows in the ListBox
Dim LoopIdx As Integer // Main Loop Indice
Dim DbleIdx As Integer // Sub Loop Indice
Dim RawSource As String // Row contents to compare to the rest of the ListBox contents

// Get the number of rows in the ListBox
RowCnt = LB_Bookmarks.ListCount // 1-based value

// Loop thru the ListBox contents
For LoopIdx = RowCnt DownTo 1
// Get One Row Contents (to compare to the other Rows)
RawSource = LB.Cell(LoopIdx - 1,-1)

// Scan the ListBox for a duplicate
For DbleIdx = 1 to LoopIdx - 1
  If RawSource = LB.Cell(DbleIdx,-1) Then
    // I found a duplicate, so delete it
    LB.RemoveRow LoopIdx - 1
    
    // Remove 1 to LoopIdx
    LoopIdx = LoopIdx - 1 // What can be the impact on the main loop ?
  End If
Next

// To avoid 1, Infinite Loop
If UserCancelled Then Exit

Next[/code]

The code above seems to work. Not placed in the correct project, but it works.

BTW: after a simple check, it seems that TWO instances of the same data stays on the end of the process: two Rows with the same contents… and I do not make a real compare between the original contents and the resulting contents. I only checked the results.

Worst, I forgot that the file I used as source to delete duplicates is a bit bad: Column 2 data does not always correspond to Column 1 data on a Row basis. *

So, I have something wrong, but how do I correct that ?

  • That source file was created at an early stage of the project when I had bad storage for unknown reasons (Column 1 have an URL, and Column 2 have that url Title). Sometimes I had URL with another Title (a previous one it seems).

An alternative way to do it might be to put the listbox rows into a Dictionary as keys and then back to the Listbox.
A Dictionary can’t hold duplicate keys so any duplicate will be removed.
Have not tried this though :stuck_out_tongue:

Albin,

thank you for the idea. I love it ! I will try it.

  Dim source As Listbox = lb // your ListBox Name
  Dim i, j As Integer
  For i = 0 To source.ListCount-1
    For j = i + 1 To source.ListCount-1
      If source.Cell(i, -1) = source.Cell(j, -1) Then
        source.RemoveRow(j)
      End If
    Next
  Next

Axel,
That routine for removing duplicates worked very well for a combo box also. I used it to remove duplicates where a user may have typed the same thing as was in the List.

Thanks,
Bob - N4RFC

[quote=140083:@Axel Schneider] Dim source As Listbox = lb // your ListBox Name Dim i, j As Integer For i = 0 To source.ListCount-1 For j = i + 1 To source.ListCount-1 If source.Cell(i, -1) = source.Cell(j, -1) Then source.RemoveRow(j) End If Next Next [/quote]

Maybe it is late and I am tired, but if you remove a row, are your i and j counters not going to then go past the end of the rows as the listcount reduces?

No, because source.ListCount-1 is evaluated each time through the loop. But it will skip rows. It will work for a single duplicate, but may miss multiples.

when removing items for any type of list… always go in REVERSE

 Dim source As Listbox = lb // your ListBox Name
  Dim i, j As Integer
  For i = source.ListCount-1 downto 0
    For j = source.ListCount-1 downto i+1
      If source.Cell(i, -1) = source.Cell(j, -1) Then
        source.RemoveRow(j)
      End If
    Next
  Next

will NOT skip rows, will NOT miss multiple duplicates

@Dave S
I believe that the j-for-loop isn’t going to work.
I think the code should be

For j = source.ListCount-2 downto 0

Paul, I was looking at it, but I think that j will not start until it has a value higher than i+1. For example:
source.ListCount = 10
For i = 9 downto 0
If i = 9, then i+1 = 10, so
For j = 9 downto 10, I think this for will not execute.
Then i = 8, i+1 = 9
For j = 9 downto 9, this for will be executed 1 time, comparing source.Cell(8, -1) = souce.Cell(9, -1)
and if they are equal, remove source.RemoveRow(9)

So I think the code is right (I’m remembering/learning how to code).

Or if you want to avoid the first For j to not run, then:

Dim source As Listbox = lb // your ListBox Name Dim i, j As Integer For i = source.ListCount-2 downto 0 For j = source.ListCount-1 downto i+1 If source.Cell(i, -1) = source.Cell(j, -1) Then source.RemoveRow(j) End If Next Next

I hope I’m right.

If I = 9 downto 0
The i loop starts with 9 so you want to check out j from 8 to 0 for duplicates.
That’s why I think the j for loop has to be For j = source.ListCount-2 downto 0.

Just my 2 cents.
[edit]
In fact I even believe that the i loop should be For i = source.ListCount-1 downto 1
Because the last one can never be a duplicate.
[/edit]

Paul, I did this test:

Dim i,j As Integer Dim sourceListcount As Integer = 10 For i = sourceListcount-1 DownTo 0 For j = sourceListcount-1 DownTo i+1 MsgBox(i.ToText + " " + j.ToText) // Then changed to: Listbox1.AddRow(i.ToText,j.ToText) Next Next

The code is executed and the first ‘i’ reported is 8 and the first ‘j’ reported is 9.
So when i = 9 then j = 9 DownTo 10, the code inside the For j … Next is not executed.
Once i =8 and j= 9 DownTo 9, the code is executed once (i=8, j=9)

Of course, this example is not deleting anything, so it is run as it were no duplicates.

I know this is way off topic but please excuse me.

I have a method I use to add/update rows of the listbox. I use a RowTag using a unique ID to determine which row is which. Then whenever I want to add (or update) data, i call that method. I know that I won’t have duplicate data as I would have rows with duplicate RowTags with duplicate Unique IDs. And since I walk the list of RowTags looking for one that matches before doing a true add row. If I find it in the RowTags, I update that row with the new data over adding a new row. That way I can not have the deduplicates.

I also have a method to remove a row based on the RowTag Unique ID. And if the Unique ID doesn’t exist on any rows’ RowTags then I don’t do anything. Its already gone.

This way you don’t have to run through the #s rows over and over again to determine if there is duplicates.

But now I will return you back to your normal thread/discussion.
—sb

I did not realise that, thanks.

That is not the way I read it. The i loop iterates the listbox from bottom to top. The j loop removes any equivalents from the bottom to i + 1 thus keeping the first entry for each potential duplicate.