Best Method To Determine Duplicate Items in a Large List?

“Best Method To Determine Duplicate Items in a Large List?”
is the name of an actual thread in the Apple’s AppleScript list.

The way they follow is an iterative comparison (brute force ?).

This reminds me the DISTINCT SQLite keyword. What about saving the list in a SQLite data base, then load it using DISTINCT to make the duplicates excludes ?

Of course, all depends on the way DISTINCT is used *.

Link to the relevant page at sqlite.com is here .

What do you think about this idea ?

  • I use DISTINCT in one of my projects to get a list of user’s countries (one country only in the result list). I had to modify the way the Country name is entered (from a PopupMenu to avoid typos).

Please give a specific example of what you are trying to do.

I have a list and I want to exclude (following some criteria) duplicate(s) entrie(s).

Duplicate entry can be a date (SQLDate) or a title or… (can be both).

Edit:
I can do that at “Open” time, populate a Listbox with “unique” Rows, and store the result in the usual target file (csv, xml, json or in the real .sqlite file).

Well, depending on the number of columns of data and rows you’re dealing with, two methods immediately pop into my head.

  1. Use individual arrays for each column, use sortwith to sort them all based on a particular one and then iterate through them and remove duplicate indexes.
  2. Use an in-memory sqlite database.

Thanks.

When using a large set of data, this immediately makes me think of a database. When using a database, use the power of the database. Don’t iterate.

// Hopefully you have an index on AllKnownPeopleOfTheWorld. DELETE FROM UnknownPeople WHERE UnknownPeople.ID IN (SELECT ID FROM AllKnownPeopleOfTheWorld)

If all of this data is just in Xojo, then another solution might be needed.

Hello.

Uses INTERSECT Operator