is there a way to interrupt a sqlselect ?

Jean-Yves_Pochez · September 18, 2016, 7:17pm

Hi,

I have some big file with 45k records. I make some calcs on this to search for duplicates and it takes too long.
xojo is stucked at a line rs = mydb.sqlselect(“my sql select”)
the only way to stop it is by force-quitting the application.
even in debug mode, if you stop the debugger, you never come into xojo.

is there a better way, to ask if the user wants to stop the query if it takes too long ?

thanks.

Tim_Parnell · September 18, 2016, 7:58pm

Is limiting the query and doing multiple queries in a thread out of the question? You may not be able to stop it right away but you’d be able to stop it before the next query begins.

David_Cox · September 18, 2016, 8:06pm

Run the SQLSelect() in a thread then, if necessary, kill the thread.

Phillip_Zedalis · September 18, 2016, 10:00pm

SQL queries are blocking so there is no way to escape the SQLSelect call in order to kill the thread.

Phillip_Zedalis · September 18, 2016, 10:00pm

[quote=287531:@Jean-Yves Pochez]Hi,

I have some big file with 45k records. I make some calcs on this to search for duplicates and it takes too long.
xojo is stucked at a line rs = mydb.sqlselect(“my sql select”)
the only way to stop it is by force-quitting the application.
even in debug mode, if you stop the debugger, you never come into xojo.

is there a better way, to ask if the user wants to stop the query if it takes too long ?

thanks.[/quote]

You might look into the OFFSET/FETCH commands that many SQL engines support. You ultimately need to limit how many results you return at a time.

Jean-Yves_Pochez · September 18, 2016, 10:19pm

as there is a GROUP BY in the select command, I doubt the offset or limit options will end before the end of the query ?

Phillip_Zedalis · September 18, 2016, 10:45pm

Which database engine are you using? You want to pull down a much smaller set of records and then loop through the results. Most SQL engines support offset’s so you can iteratively pull the needed records.

Greg_O_Lone · September 19, 2016, 2:50am

Are the tables properly indexed? 45K seems like a very small number to be having speed problems.

DaveS · September 19, 2016, 5:10am

Post an example of the query (or a description), as Greg stated, 45K records should be able to be processed by even the most poorly formed SQL in a matter of seconds . I have had queries that worked on multiple joins across tables with millions of records the ran in under a minute in the worst conditions.

and a key word in what Greg stated is “properly”. Just having indexes for the sake of having them is rarely a good thing

Jean-Yves_Pochez · September 19, 2016, 5:47am

select L.id_article,L.designation from articles L INNER JOIN ( SELECT designation,count(*) from articles GROUP BY upper(designation) HAVING count(*)>1) R ON upper(L.designation)=upper(R.designation)

id_article is the primary key, the field “designation” is indexed (and the primary key also)
database is sqlite. it’s a simple sql request to find duplicates.
works fine on 100’s of records, stay stucked on 45k records …
note: there are a lot of duplicates in that 45k records file, may be 30k, which can explain the problem ?

Maximilian_Tyrtania · September 19, 2016, 9:22am

Maybe this works better:

L1.id_article,L1.designation from articles L1 join articles L2 on L1.designation=L2.designation and L1.rowid != L2.rowid

You may have to replace rowid with the primary key field for that table. If the designation field is indexed this query should be quite fast.

Jean-Yves_Pochez · September 19, 2016, 10:10am

can it be the upper() that slows down the process ?
is it possible to make an index with the upper() values ?

Christian_Schmitz · September 19, 2016, 10:20am

If you use MBS SQL Plugin you can have SQL select run threaded and call cancel method if necessary.

Maximilian_Tyrtania · September 19, 2016, 10:25am

It should: See Indexes On Expressions

I have never tried this with SQLite, but this might work:

CREATE INDEX idx_article_designationUpper ON articles(Upper(designation));

Greg_O_Lone · September 19, 2016, 10:39am

You might also try using count(rowid) instead of count(*).

Jean-Yves_Pochez · September 19, 2016, 10:59am

[quote=287596:@Maximilian Tyrtania]It should: See https://www.sqlite.org/expridx.html

I have never tried this with SQLite, but this might work:

CREATE INDEX idx_article_designationUpper ON articles(Upper(designation));[/quote]
got a syntax error with this one … “near (” ?

Jean-Yves_Pochez · September 19, 2016, 10:59am

this does not change the problem, still stuck in the sqlselect command

Jean-Yves_Pochez · September 19, 2016, 11:01am

[quote=287585:@Maximilian Tyrtania]Maybe this works better:

L1.id_article,L1.designation from articles L1 join articles L2 on L1.designation=L2.designation and L1.rowid != L2.rowid

You may have to replace rowid with the primary key field for that table. If the designation field is indexed this query should be quite fast.[/quote]
this one works !
ps: I had 43k duplicates in the file … now have some cleaning to do !

Edit: in fact not … if I add upper(l1.designation)=upper(L2.designation) the sqlselect re-stuck.
so it’s defilitely the upper command that slows down the process.
going back to the idea to make an index on upper values.

Maximilian_Tyrtania · September 19, 2016, 12:27pm

Maybe

… L1.designation=L2.designation COLLATE NOCASE

does the trick?

DaveS · September 19, 2016, 2:38pm

If you want only the duplicates, no need to join anything


SELECT  id_article,designation 
FROM articles 
 GROUP BY designation
HAVING count(8)>1
COLLATE NOCASE

and yes the is an “8” not a “*”

and as mentioned having aggregate functions like UPPER anywhere in a SELECT statement is EXPENSIVE

if you only want ONE instance of each duplicate, add the word “DISTINCT” after the word “SELECT”