is there a way to interrupt a sqlselect ?

Hi,

I have some big file with 45k records. I make some calcs on this to search for duplicates and it takes too long.
xojo is stucked at a line rs = mydb.sqlselect(“my sql select”)
the only way to stop it is by force-quitting the application.
even in debug mode, if you stop the debugger, you never come into xojo.

is there a better way, to ask if the user wants to stop the query if it takes too long ?

thanks.

Is limiting the query and doing multiple queries in a thread out of the question? You may not be able to stop it right away but you’d be able to stop it before the next query begins.

Run the SQLSelect(…) in a thread then, if necessary, kill the thread.

SQL queries are blocking so there is no way to escape the SQLSelect call in order to kill the thread.

[quote=287531:@Jean-Yves Pochez]Hi,

I have some big file with 45k records. I make some calcs on this to search for duplicates and it takes too long.
xojo is stucked at a line rs = mydb.sqlselect(“my sql select”)
the only way to stop it is by force-quitting the application.
even in debug mode, if you stop the debugger, you never come into xojo.

is there a better way, to ask if the user wants to stop the query if it takes too long ?

thanks.[/quote]

You might look into the OFFSET/FETCH commands that many SQL engines support. You ultimately need to limit how many results you return at a time.

as there is a GROUP BY in the select command, I doubt the offset or limit options will end before the end of the query ?

Which database engine are you using? You want to pull down a much smaller set of records and then loop through the results. Most SQL engines support offset’s so you can iteratively pull the needed records.

Are the tables properly indexed? 45K seems like a very small number to be having speed problems.

Post an example of the query (or a description), as Greg stated, 45K records should be able to be processed by even the most poorly formed SQL in a matter of seconds . I have had queries that worked on multiple joins across tables with millions of records the ran in under a minute in the worst conditions.

and a key word in what Greg stated is “properly”. Just having indexes for the sake of having them is rarely a good thing

select L.id_article,L.designation from articles L INNER JOIN ( SELECT designation,count(*) from articles GROUP BY upper(designation) HAVING count(*)>1) R ON upper(L.designation)=upper(R.designation)

id_article is the primary key, the field “designation” is indexed (and the primary key also)
database is sqlite. it’s a simple sql request to find duplicates.
works fine on 100’s of records, stay stucked on 45k records …
note: there are a lot of duplicates in that 45k records file, may be 30k, which can explain the problem ?

Maybe this works better:

L1.id_article,L1.designation from articles L1 join articles L2 on L1.designation=L2.designation and L1.rowid != L2.rowid

You may have to replace rowid with the primary key field for that table. If the designation field is indexed this query should be quite fast.

can it be the upper() that slows down the process ?
is it possible to make an index with the upper() values ?

If you use MBS SQL Plugin you can have SQL select run threaded and call cancel method if necessary.

It should: See Indexes On Expressions

I have never tried this with SQLite, but this might work:

CREATE INDEX idx_article_designationUpper ON articles(Upper(designation));

You might also try using count(rowid) instead of count(*).

[quote=287596:@Maximilian Tyrtania]It should: See https://www.sqlite.org/expridx.html

I have never tried this with SQLite, but this might work:

CREATE INDEX idx_article_designationUpper ON articles(Upper(designation));[/quote]
got a syntax error with this one … “near (” ?

this does not change the problem, still stuck in the sqlselect command

[quote=287585:@Maximilian Tyrtania]Maybe this works better:

L1.id_article,L1.designation from articles L1 join articles L2 on L1.designation=L2.designation and L1.rowid != L2.rowid

You may have to replace rowid with the primary key field for that table. If the designation field is indexed this query should be quite fast.[/quote]
this one works !
ps: I had 43k duplicates in the file … now have some cleaning to do !

Edit: in fact not … if I add upper(l1.designation)=upper(L2.designation) the sqlselect re-stuck.
so it’s defilitely the upper command that slows down the process.
going back to the idea to make an index on upper values.

Maybe

… L1.designation=L2.designation COLLATE NOCASE

does the trick?

If you want only the duplicates, no need to join anything


SELECT  id_article,designation 
FROM articles 
 GROUP BY designation
HAVING count(8)>1
COLLATE NOCASE

and yes the is an “8” not a “*”

and as mentioned having aggregate functions like UPPER anywhere in a SELECT statement is EXPENSIVE

if you only want ONE instance of each duplicate, add the word “DISTINCT” after the word “SELECT”