Best Method To Determine Duplicate Items in a Large List?

Emile_Schwarz · August 16, 2017, 10:16am

Best Method To Determine Duplicate Items in a Large List?
is the name of an actual thread in the Apples AppleScript list.

The way they follow is an iterative comparison (brute force ?).

This reminds me the DISTINCT SQLite keyword. What about saving the list in a SQLite data base, then load it using DISTINCT to make the duplicates excludes ?

Of course, all depends on the way DISTINCT is used *.

Link to the relevant page at sqlite.com is here .

What do you think about this idea ?

I use DISTINCT in one of my projects to get a list of users countries (one country only in the result list). I had to modify the way the Country name is entered (from a PopupMenu to avoid typos).

Greg_O_Lone · August 16, 2017, 10:20am

Please give a specific example of what you are trying to do.

Emile_Schwarz · August 16, 2017, 10:30am

I have a list and I want to exclude (following some criteria) duplicate(s) entrie(s).

Duplicate entry can be a date (SQLDate) or a title or (can be both).

Edit:
I can do that at Open time, populate a Listbox with unique Rows, and store the result in the usual target file (csv, xml, json or in the real .sqlite file).

Greg_O_Lone · August 16, 2017, 10:49am

Well, depending on the number of columns of data and rows you’re dealing with, two methods immediately pop into my head.

Use individual arrays for each column, use sortwith to sort them all based on a particular one and then iterate through them and remove duplicate indexes.
Use an in-memory sqlite database.

Emile_Schwarz · August 16, 2017, 10:58am

Thanks.

Kevin_Cully1 · August 16, 2017, 3:12pm

When using a large set of data, this immediately makes me think of a database. When using a database, use the power of the database. Don’t iterate.

// Hopefully you have an index on AllKnownPeopleOfTheWorld. DELETE FROM UnknownPeople WHERE UnknownPeople.ID IN (SELECT ID FROM AllKnownPeopleOfTheWorld)

If all of this data is just in Xojo, then another solution might be needed.

Mauricio_Pulla · August 16, 2017, 6:04pm

Hello.

Uses INTERSECT Operator