is there a way to interrupt a sqlselect ?

Jean-Yves_Pochez · September 19, 2016, 7:01pm

[quote=287631:@Dave S]If you want only the duplicates, no need to join anything

SELECT  id_article,designation 
FROM articles 
 GROUP BY designation
HAVING count(8)>1
COLLATE NOCASE

and yes the is an “8” not a “*”
and as mentioned having aggregate functions like UPPER anywhere in a SELECT statement is EXPENSIVE
if you only want ONE instance of each duplicate, add the word “DISTINCT” after the word “SELECT”[/quote]

sorry, this query does not return the duplicates of the table. it returns 40k records, it is fast, but it is not what I want
(and I dont know how to explain what it returns, it has non sense ?)

Jean-Yves_Pochez · September 19, 2016, 7:07pm

[quote=287615:@Maximilian Tyrtania]Maybe

… L1.designation=L2.designation COLLATE NOCASE

does the trick?[/quote]
this seems to be the good one !
thanks Maximilian !

DaveS · September 19, 2016, 7:09pm

sorry… I did make a mistake

SELECT  id_article,designation 
FROM articles 
 GROUP BY  id_article,designation 
HAVING count(8)>1
COLLATE NOCASE

Jean-Yves_Pochez · September 19, 2016, 7:45pm

[quote=287683:@Dave S]sorry… I did make a mistake

SELECT id_article,designation FROM articles GROUP BY id_article,designation HAVING count(8)>1 COLLATE NOCASE[/quote]

this one works - and is fast - thanks Dave.
but it always returns one instance of the duplicates, if I add “distinct” after “select” or not.

it there a way to have all the duplicates records together in a selection ?

oh ! and it returns the same results if I write count(8) or count(1) or count(*) !

DaveS · September 19, 2016, 9:29pm

try this… a bit more complex … .and off the top of my head

SELECT id_article,designation 
   FROM articles  a, 
          (SELECT  id_article,designation 
            FROM articles 
          GROUP BY  id_article,designation 
         HAVING count(8)>1
       COLLATE NOCASE) b
WHERE a.id_article = b.id_article
     AND a.designation = b.designation

that should provide all the duplicates, you could if necessary collect other fields for those duplicates without affecting the “duplicate” criteria by adding them to the list ON THE FIRST SELECT LINE ONLY

and yes it does

count(1) or count(8) are identical as far as the SQL compiler is concerned,
but count(*) is not… it will force the compiler to build a data vector, populate, and destroy it, when for this type of count that is just wasted overhead. count(8) creates no such overhead, and while you might not notice a discernable difference on 45k records, you would on 45 million

and the reason most developer use “8” as opposed to any other number is that is is right below “*” on most keyboards.

Tobias_Bussmann · September 21, 2016, 10:32am

[quote=287697:@Dave S]count(1) or count(8) are identical as far as the SQL compiler is concerned,
but count(*) is not… it will force the compiler to build a data vector, populate, and destroy it, when for this type of count that is just wasted overhead. count(8) creates no such overhead, and while you might not notice a discernable difference on 45k records, you would on 45 million
[/quote]

sorry Dave - I kindly disagree here.

The SQL standard (at least since the ancient SQL92) explicitly states that COUNT(*) to be the cardinality of the relation. So this is standard way of asking for it. If a certain DBMS does not implement it efficiently that’s a problem of the DBMS. Sure, count(1) is a common way of ‘optimising’ such queries for SQL engines that does not do this properly - but at the same time it prevents you from making use of any optimisation that may be available in a future version of the DBMS to get the cardinality in a very efficient way (e.g. from some meta data instead of scan).

While I agree, semantically count(*) is the same as count(constant-expression) (but not as count(non-not-null-column-or-expression)) I’d usually stick to the standard unless I really want to force a specific behaviour for a specific query of a specific DBMS on a specific version and know what I do. In general I’ve learnt not to believe to be more clever then the query planner/optimizer of my DBMS

Here we see the gap of concepts. SQL is a declarative language, most programming languages like Xojo are imperative ones. A SQL programmer stats what the result shall be, a Xojo programmer states how the result shall be calculated. The connector between the two concepts is the DMBS. It has the task to understand the declarative SQL and efficiently transform that into imperative statements accessing the data on the storage.

DaveS · September 21, 2016, 2:33pm

I won’t belabor the point, but you are most incorrect, I have been working with SQL of this type for near 30 years…
and the factor is not the aggregate function of COUNT but the definition of “*” which is “ALL FIELDS”, and if there is no need to determine what ALL fields are, then you are wasting time.

I will not further engage this argument. At this point we will agree to disagree, and I will continue to educate SQL noobies in the proper way to deal with it.