just because a table has an index doesn’t mean it makes it “better”
sometimes it makes it worse, and sometimes the optimizer decides to not use it at all.
An explain plan might tell you what is going on, but those are not easy to understand
Why subquery have their place… in this instance it would have added no value, as the same actions happen.
FYI. A “GROUP BY” while it might use the index to select items, it still does a internal ORDER BY to “group” the records to be counted, so the more sparse the physical arrangement is, the longer it might take (no index will “fix” that)
Just for fun, try
select fromname,count(*)
from (
select sqldatetime,fromname
from mydata
where sqldatetime >= '2018-10-17'
ORDER BY fromName
) group by fromname
and why select rowid and email if you never use them?
but the fact that “rowid” is “required” to get a timing is what I find odd…
I wrote “Tadpole” an SQLite manager, it requires nothing special to determine how long a query takes.
If you want (and are able)… email me the table, I’d be interested in seeing whats going on
I’m surprised this works at all. I guess sqlite is more lax than other sql databases. Others require every non-aggregated field to be included in the Group By list, so in this case sqldatetime and rowid should both be included with fromname in the Group By list.
For example, I assume you have multiple differing sqldatatime values for each fromname, so how can it list the sqldatetime value in the results? How does it decide which of the different values to show? And what use is a single value out of a set of differing values of use to you? Same thing applies to rowid.
Can you post an example of the output of the above query?
Me too. What is is supposed to return?
If you include rowid without an aggregation you might expect every row to be returned , with a count of 1
But as Jay says, in Oracle and SQL Server (which I work with most), the query is invalid.
You can return a field if you group by it, but any field which is NOT part of the group by has to be aggregated using
MIN(), MAX(), count() or similar.
How does this query work instead?
select sqldatetime,fromname,count(rowid) from mydata where sqldatetime >= '2018-10-17' group by fromname,sqldatetime
or
select fromname,count(1) from mydata where sqldatetime >= '2018-10-17' group by fromname
Supposedly rowid is “required” in order to get a timing on query … something I do not understand, since SQLite (as well as every other DB engine I have ever used) has that as a DB function independant of the data content…
But I am ashamed to admit… that I missed the major point that both Jay and Jeff mentioned…
where group by is concerned every field in the Select should either be part of the Group By or used in an aggregate function
but it seems for SQLite at least, it is “lax” and for those fields not in the Group by returns the LAST value it encounters
note : last may or may not mean most “recent” or greatest…
Another thought:
You suggest it is grouping first, then applying the date filter
That would be unlikely, (but not ruling it out)
That would mean it is doing the equivalent of a HAVING clause
eg: This should filter to keep the row count down, then group:
select sqldatetime,fromname,count(rowid) from mydata where sqldatetime >= '2018-10-17' group by fromname,sqldatetime
This arguably would group then filter, unless the engine optimises
select sqldatetime,fromname,count(rowid) from mydata group by fromname,sqldatetime
having sqldatetime >= '2018-10-17'
We still need an answer to Neil’s questions. Also, what platform/hardware is this running on? Regardless of the issues of grouping in this instance, I find it difficult to believe a 19 second time for 160k records.
SELECT firstname,count(8) as cnt
FROM (SELECT firstname FROM mydata where sqldatetime>'2018-10-17')
GROUP BY firstname
This is no reason to carry around all that extra overhead if all you need is name and count
and if SQLBrowser required ROWID in order to tell you how long the query takes, I’d find another SQL tool, as you should never have to alter your code to fit somethng like that.
and “technically” the above query should be identical to
SELECT firstname,count(8) as cnt
FROM mydata where sqldatetime>'2018-10-17'
GROUP BY firstname
Much like Dave is saying above, I did a simple test:
[quote]CREATE TABLE mydata (
rowid INTEGER PRIMARY KEY AUTOINCREMENT,
sqldatetime DATE,
fromname TEXT
);[/quote]
Added 10 records with 3 different names (I know is nothing compared with 160k)
Using SQLiteStudio with this query:
select sqldatetime,rowid,fromname,count(*) from mydata where sqldatetime >= '2018-10-17' group by fromname
I usually get 0.018 or 0.019 second(s)
Because you don’t need sqldatetime or rowid when you group by fromname and want count, I did this:
select fromname,count(*) from mydata where sqldatetime >= '2018-10-17' group by fromname
Depending on how SQLite works the 17GB may be a contributing factor on why the query is so slow. Have you tried to delete the main bulk of the data and then executed the query again?
Valentina Studio can analyse the SQL timing. Have you tried something similar?
if you have any picture or binary or pdf inside the datas, it can rise the size very fast.
a 200MB database jumped to 2.5GB after 6 months since we add a pdf inside it only …
it is only 60k records (and not all of them have a stored pdf)