Using two index's in sql slow

dave_duke · October 17, 2018, 1:31pm

Hi, I am having an issue with indexs. I have a database which is index on sqldatetime and fromname

This command takes .001 seconds

select sqldatetime,rowid,fromname from mydata where sqldatetime >= ‘2018-10-17’

this command using the second index takes 19 seconds

select sqldatetime,rowid,fromname,count(*) from mydata where sqldatetime >= ‘2018-10-17’ group by fromname

I only want to group the results from the time shown, Im guessing its grouping from fromname first (160,000 records) then doing the date range

Is there a way to get it to do the sqldatetime search then apply the results to the group?

DaveS · October 17, 2018, 1:40pm

how long does the groupby take without an index?

just because a table has an index doesn’t mean it makes it “better”
sometimes it makes it worse, and sometimes the optimizer decides to not use it at all.

An explain plan might tell you what is going on, but those are not easy to understand

DaveS · October 17, 2018, 1:52pm

are you talking about doing a sub-query? if so… then yes you can

SELECT * FROM (select * From table)

a useless example, but shows the concept

DaveS · October 17, 2018, 2:06pm

Why subquery have their place… in this instance it would have added no value, as the same actions happen.

FYI. A “GROUP BY” while it might use the index to select items, it still does a internal ORDER BY to “group” the records to be counted, so the more sparse the physical arrangement is, the longer it might take (no index will “fix” that)

Just for fun, try

select fromname,count(*) 
   from (
        select sqldatetime,fromname
         from mydata 
       where sqldatetime >= '2018-10-17'
       ORDER BY fromName
  ) group by fromname

and why select rowid and email if you never use them?

DaveS · October 17, 2018, 2:14pm

not at all… it can’t count until it is grouped

but the fact that “rowid” is “required” to get a timing is what I find odd…
I wrote “Tadpole” an SQLite manager, it requires nothing special to determine how long a query takes.

If you want (and are able)… email me the table, I’d be interested in seeing whats going on

Antonio_Rinaldi · October 17, 2018, 3:45pm

try to remove the order in the inner select

select fromname,count(*) 
from (
select sqldatetime,fromname
from mydata 
where sqldatetime >= '2018-10-17'
) group by fromname

Neil_Burkholder · October 17, 2018, 7:51pm

What type of DB? SQLite, SQL Server, MySQL…?

How much time does this take? How many records?

select count(*) 
   from (
        select sqldatetime,fromname
         from mydata 
       where sqldatetime >= '2018-10-17'
       ORDER BY fromName
  )

Jay_Madren · October 18, 2018, 4:07am

I’m surprised this works at all. I guess sqlite is more lax than other sql databases. Others require every non-aggregated field to be included in the Group By list, so in this case sqldatetime and rowid should both be included with fromname in the Group By list.

For example, I assume you have multiple differing sqldatatime values for each fromname, so how can it list the sqldatetime value in the results? How does it decide which of the different values to show? And what use is a single value out of a set of differing values of use to you? Same thing applies to rowid.

Can you post an example of the output of the above query?

Jeff_Tullin · October 18, 2018, 5:10am

Me too. What is is supposed to return?
If you include rowid without an aggregation you might expect every row to be returned , with a count of 1
But as Jay says, in Oracle and SQL Server (which I work with most), the query is invalid.
You can return a field if you group by it, but any field which is NOT part of the group by has to be aggregated using
MIN(), MAX(), count() or similar.

How does this query work instead?

select sqldatetime,fromname,count(rowid) from mydata where sqldatetime >= '2018-10-17' group by fromname,sqldatetime

or

select fromname,count(1) from mydata where sqldatetime >= '2018-10-17' group by fromname

DaveS · October 18, 2018, 5:24am

Supposedly rowid is “required” in order to get a timing on query … something I do not understand, since SQLite (as well as every other DB engine I have ever used) has that as a DB function independant of the data content…

But I am ashamed to admit… that I missed the major point that both Jay and Jeff mentioned…
where group by is concerned every field in the Select should either be part of the Group By or used in an aggregate function

but it seems for SQLite at least, it is “lax” and for those fields not in the Group by returns the LAST value it encounters

note : last may or may not mean most “recent” or greatest…

Jeff_Tullin · October 18, 2018, 5:59am

Another thought:
You suggest it is grouping first, then applying the date filter
That would be unlikely, (but not ruling it out)
That would mean it is doing the equivalent of a HAVING clause

eg: This should filter to keep the row count down, then group:

select sqldatetime,fromname,count(rowid) from mydata where sqldatetime >= '2018-10-17' group by fromname,sqldatetime

This arguably would group then filter, unless the engine optimises

select sqldatetime,fromname,count(rowid) from mydata group by fromname,sqldatetime having sqldatetime >= '2018-10-17'

Jay_Madren · October 18, 2018, 12:14pm

We still need an answer to Neil’s questions. Also, what platform/hardware is this running on? Regardless of the issues of grouping in this instance, I find it difficult to believe a 19 second time for 160k records.

DaveS · October 21, 2018, 11:33pm

you said 2 queries is “instant”… what are they… I’m sure they can be combined into a single query…

DaveS · October 22, 2018, 12:16am

I still stand by this query

SELECT firstname,count(8) as cnt
FROM (SELECT firstname FROM mydata where sqldatetime>'2018-10-17')
GROUP BY firstname

This is no reason to carry around all that extra overhead if all you need is name and count

and if SQLBrowser required ROWID in order to tell you how long the query takes, I’d find another SQL tool, as you should never have to alter your code to fit somethng like that.

and “technically” the above query should be identical to

SELECT firstname,count(8) as cnt
 FROM mydata where sqldatetime>'2018-10-17'
GROUP BY firstname

AlbertoD · October 22, 2018, 12:31am

Much like Dave is saying above, I did a simple test:

[quote]CREATE TABLE mydata (
rowid INTEGER PRIMARY KEY AUTOINCREMENT,
sqldatetime DATE,
fromname TEXT
);[/quote]
Added 10 records with 3 different names (I know is nothing compared with 160k)

Using SQLiteStudio with this query:

select sqldatetime,rowid,fromname,count(*) from mydata where sqldatetime >= '2018-10-17' group by fromname

I usually get 0.018 or 0.019 second(s)
Because you don’t need sqldatetime or rowid when you group by fromname and want count, I did this:

select fromname,count(*) from mydata where sqldatetime >= '2018-10-17' group by fromname

I get 0.001 or 0.000 second(s)

DaveS · October 22, 2018, 2:04am

none of this makes sense… would it be possible to email me the table (or one that has sanitized data if that is an issue)?

You cannot do a count by firstname without a Group By, since that is the whole purpose of that statement

AlbertoD · October 22, 2018, 2:13am

Can we go a couple of steps back?

how the database was created
where/how you execute each SELECT command (Xojo, other SQLite editor, SQLite cli)
what version are you using

The idea to group by is to show the count for each different fromname. You can try to do

select fromname,count(*) from mydata where sqldatetime >= '2018-10-17'

you will not get useful information from that.

Jeff_Tullin · October 22, 2018, 4:48am

I’m mildy curious about how 160K of records can reach 17Gb of data…

Beatrix_Willius · October 22, 2018, 5:44am

Depending on how SQLite works the 17GB may be a contributing factor on why the query is so slow. Have you tried to delete the main bulk of the data and then executed the query again?

Valentina Studio can analyse the SQL timing. Have you tried something similar?

Jean-Yves_Pochez · October 22, 2018, 6:17am

if you have any picture or binary or pdf inside the datas, it can rise the size very fast.
a 200MB database jumped to 2.5GB after 6 months since we add a pdf inside it only …
it is only 60k records (and not all of them have a stored pdf)