Fastest way to get COUNT(*)

Neil_Burkholder · May 13, 2017, 10:25pm

If the table has a rowid you can use that and it’s much faster than offset.

SELECT * FROM Test WHERE _rowid_ > 900000 LIMIT 1000

Jeff_Tullin · May 13, 2017, 10:56pm

I would expect that

select count(ID) from table

should be faster than

select * from table

if the ID field is indexed.

Neil_Burkholder · May 13, 2017, 11:00pm

I would have expected that myself. According to my tests count(*) is about 30% faster.

Tim_Hare · May 14, 2017, 2:20am

The end user is responsible for his own cr*ppy database design. Most will be done reasonably well done, or at least be indexed. Besides, 18Gb is a bit extreme.

Neil_Burkholder · May 15, 2017, 9:26pm

(I know this isn’t what you want to hear but… Sorry in advance if I irritate you in any way.)

I believe most use LIMIT / OFFSET. The other option is loading user specified limit and loading those amount of items into memory as a page.

I tested just now in both PiDog and Valentina Studio. Both scroll on demand using LIMIT / OFFSET. I can tell because on a table with 8 million records. Scrolling is very fast and smooth at the top of the list but very laggy at the bottom.

As you know using offset is still much faster than than returning the entire record-set.

Here are a few tests.

Using a limit of 1,000

'6 ms
SELECT * FROM test LIMIT 1000

'9 ms
SELECT * FROM test LIMIT 1000 OFFSET 100000

'64 ms
SELECT * FROM test LIMIT 1000 OFFSET 1000000

'410 ms
SELECT * FROM test LIMIT 1000 OFFSET 7000000

'39.798 seconds - took about 15 minutes to load because VS loaded the entire table into memory.
'Fast smooth scrolling after an unresponsive app for 15 minutes. 8 Million records returned.
SELECT * FROM test

Using a limit of 10,000

'46 ms
SELECT * FROM test LIMIT 10000

'51 ms
SELECT * FROM test LIMIT 10000 OFFSET 100000

'103 ms
SELECT * FROM test LIMIT 10000 OFFSET 1000000

'464 ms
SELECT * FROM test LIMIT 10000 OFFSET 7000000

My take from all this is that as the offset increases it is beneficial to also increase the limit because the cost of the higher limit becomes smaller in comparison to the overall query. I will be doing some more testing.

DaveS · May 15, 2017, 9:45pm

Try this on a table WITHOUT a Integer AutoInc PK

here is what I got (table has 1,000,000 records)

' 1.42 seconds
SELECT * FROM test LIMIT 1000

' 13.90 seconds
SELECT * FROM test LIMIT 1000 OFFSET 100000

' 142.2 seconds
SELECT * FROM test LIMIT 1000 OFFSET 999999 (since I only had 1mill to start with)

now using ROWID

' 1.53 seconds
SELECT * FROM test rowid<1000

' 1.58 seconds
SELECT * FROM test  where rowid>10000 and rowid<=11000

' 0.05 seconds
SELECT * FROM test where rowid>999999 and rowid<=1000000  (since I only had 1mill to start with)

but the rowid method won’t work if the source is a VIEW and not a TABLE

Neil_Burkholder · May 15, 2017, 10:03pm

RowID also doesn’t work as expected if rows have been deleted, unless you first run a very time consuming vacuum. I started to get excited about rowid until I found that out. RowID won’t work either if you are letting the user specify the filter and order by info.

My results on a table with only 4 text fields (8 million records). I do see that lack of a PK does hurt performance. How ever adding an ORDER BY RowID takes care of that issue.

'5 ms
SELECT * FROM test LIMIT 1000

'74 ms
SELECT * FROM test LIMIT 1000 OFFSET 100000

'573 ms
SELECT * FROM test LIMIT 1000 OFFSET 1000000

'3.148 seconds
SELECT * FROM test LIMIT 1000 OFFSET 7000000

'4 ms
SELECT * FROM test ORDER BY RowID LIMIT 1000

'10 ms
SELECT * FROM test ORDER BY RowID LIMIT 1000 OFFSET 100000

'64 ms
SELECT * FROM test ORDER BY RowID LIMIT 1000 OFFSET 1000000

'406 ms
SELECT * FROM test ORDER BY RowID LIMIT 1000 OFFSET 7000000

DaveS · May 15, 2017, 10:09pm

If I could get my WORST case below 5 seconds I’d be a happy camper!

adding Order by dropped from 142.2 to 123.00 better but still TWO minutes!

Neil_Burkholder · May 15, 2017, 10:13pm

Wow! How many columns?

DaveS · May 15, 2017, 10:15pm

Create Table million ( id int,
firstname text,
 lastname text,
  address text,
     city text,
    state text,
      zip text,
    phone text,
    email text,
   photo,
  taxable int)

Neil_Burkholder · May 15, 2017, 10:17pm

Sorry I see now you had posted that up farther. I wonder if it’s your photo column that is slowing things down. I wonder what results you would get if you dropped that column?

DaveS · May 15, 2017, 10:25pm

good point… let me drop it and see

DaveS · May 15, 2017, 10:41pm

after dropping PHOTO… did make a HUGE difference… and one I can compensate for actually… since PHOTO was a blob… and I don’t actually display BLOB contents …

but these results are all SUB 5 seconds

' 1.40 seconds - about the same
SELECT * FROM test LIMIT 1000

' 1.45 seconds ..   WAS 13.90
SELECT * FROM test LIMIT 1000 OFFSET 100000

' 1.18 seconds -- WAS 142.8
SELECT * FROM test LIMIT 1000 OFFSET 999999 (since I only had 1mill to start with)

now using ROWID … faster but not significantly so

' 0.03 seconds
SELECT * FROM test rowid<1000

' 1.35 seconds
SELECT * FROM test  where rowid>10000 and rowid<=11000

' 0.03 seconds
SELECT * FROM test where rowid>999999 and rowid<=1000000  (since I only had 1mill to start with)

Neil_Burkholder · May 15, 2017, 10:48pm

I’m curious now, what the results would be if you explicitly don’t include the photo column in the query?

Such as:

SELECT id, firstname, lastname,  address, city, state, zip, phone, email, taxable FROM Million ...

Probably still slow due to the fact that the data is still in the table?

DaveS · May 15, 2017, 10:51pm

yup… still waiting for that to finish

234.5 seconds!!!

SELECT id, firstname, lastname,  address, city, state, zip, phone, email, taxable FROM Million LIMIT 1000 OFFSET 999999

but only 0.03 seconds for

SELECT id, firstname, lastname,  address, city, state, zip, phone, email, taxable FROM Million where rowid>= 999999

Neil_Burkholder · May 15, 2017, 11:04pm

In the mean time I was doing my own tests (for my own future reference).

I’ve been using in memory databases a lot and was surprised to find them 2 to 3 times faster. I thought they were 10 time faster or so.

For my table a select * takes 13.43 seconds. Attaching to an in memory db and inserting all the records into an in memory db takes 5 seconds.

So if my assumption is correct, for long selects with user defined where/order by clauses you cut the query time by 1/3, and then serve up the ‘view’ from the newly created table using the rowid.

DaveS · May 15, 2017, 11:07pm

I have 3 scenarios that might happen… HAD been hoping to handle all of the the same way, but it seems that isn’t possible

The user types in a query… here I have NO control over what they say… so it gets executed AS-IS, and if it takes forever… so be it
The user just selects a table (mouse click, no SQL entered)… this is to be handled as we have been discussing above, and I will probably go for the ROWID method
The user just selects a VIEW… for this I have no choice but to use limit/offset as views have no rowid

Neil_Burkholder · May 15, 2017, 11:15pm

For 1.

You should still be able to use the user define select statement to create an in memory temp table like this:

"CREATE TABLE temp AS " + UserSelectSQL

For that though you would need to be attached to the db and alter the select SQL to reference the on disk table reference.

For 2.

That would definitely be the fastest but would need to account for tables without rowid.

For 3.

I’ve never fooled with views, but should probably check it out.

Neil_Burkholder · May 15, 2017, 11:26pm

Ok just tested scenario 3.

I created this view:

CREATE VIEW "view1" AS SELECT * from test where f1 like '%t%';

The view contains over 2M records and takes over 5 seconds to run.

SELECT * FROM view1

Running this to copy to an in memory temp table takes only 3.21667 seconds.

CREATE TABLE temp AS SELECT * FROM view1

Not only did the query run in about half the time, but now it can be loaded from the temp table using the rowid method.

DaveS · May 15, 2017, 11:27pm

For a table to NOT have a rowid, it has to be purposely created with then NOROWID constraint…
and IF someone did that , then I’d default to method #3

The problem I see with #1 is that they many have up to 10 databases already attached together, and I’d have to figure out which one (or combo) the data was in , to be able to attach back to a memory DB (which my app already has for “housekeeping” stuff)