Fastest way to get COUNT(*)

If the table has a rowid you can use that and it’s much faster than offset.

SELECT * FROM Test WHERE _rowid_ > 900000 LIMIT 1000

I would expect that

select count(ID) from table

should be faster than

select * from table

if the ID field is indexed.

I would have expected that myself. According to my tests count(*) is about 30% faster.

The end user is responsible for his own cr*ppy database design. Most will be done reasonably well done, or at least be indexed. Besides, 18Gb is a bit extreme.

(I know this isn’t what you want to hear but… Sorry in advance if I irritate you in any way.)

I believe most use LIMIT / OFFSET. The other option is loading user specified limit and loading those amount of items into memory as a page.

I tested just now in both PiDog and Valentina Studio. Both scroll on demand using LIMIT / OFFSET. I can tell because on a table with 8 million records. Scrolling is very fast and smooth at the top of the list but very laggy at the bottom.

As you know using offset is still much faster than than returning the entire record-set.

Here are a few tests.

Using a limit of 1,000

'6 ms
SELECT * FROM test LIMIT 1000

'9 ms
SELECT * FROM test LIMIT 1000 OFFSET 100000

'64 ms
SELECT * FROM test LIMIT 1000 OFFSET 1000000

'410 ms
SELECT * FROM test LIMIT 1000 OFFSET 7000000

'39.798 seconds - took about 15 minutes to load because VS loaded the entire table into memory.
'Fast smooth scrolling after an unresponsive app for 15 minutes. 8 Million records returned.
SELECT * FROM test 

Using a limit of 10,000

'46 ms
SELECT * FROM test LIMIT 10000

'51 ms
SELECT * FROM test LIMIT 10000 OFFSET 100000

'103 ms
SELECT * FROM test LIMIT 10000 OFFSET 1000000

'464 ms
SELECT * FROM test LIMIT 10000 OFFSET 7000000

My take from all this is that as the offset increases it is beneficial to also increase the limit because the cost of the higher limit becomes smaller in comparison to the overall query. I will be doing some more testing.

Try this on a table WITHOUT a Integer AutoInc PK

here is what I got (table has 1,000,000 records)

' 1.42 seconds
SELECT * FROM test LIMIT 1000

' 13.90 seconds
SELECT * FROM test LIMIT 1000 OFFSET 100000

' 142.2 seconds
SELECT * FROM test LIMIT 1000 OFFSET 999999 (since I only had 1mill to start with)

now using ROWID

' 1.53 seconds
SELECT * FROM test rowid<1000

' 1.58 seconds
SELECT * FROM test  where rowid>10000 and rowid<=11000

' 0.05 seconds
SELECT * FROM test where rowid>999999 and rowid<=1000000  (since I only had 1mill to start with)

but the rowid method won’t work if the source is a VIEW and not a TABLE

RowID also doesn’t work as expected if rows have been deleted, unless you first run a very time consuming vacuum. I started to get excited about rowid until I found that out. RowID won’t work either if you are letting the user specify the filter and order by info.

My results on a table with only 4 text fields (8 million records). I do see that lack of a PK does hurt performance. How ever adding an ORDER BY RowID takes care of that issue.

'5 ms
SELECT * FROM test LIMIT 1000

'74 ms
SELECT * FROM test LIMIT 1000 OFFSET 100000

'573 ms
SELECT * FROM test LIMIT 1000 OFFSET 1000000

'3.148 seconds
SELECT * FROM test LIMIT 1000 OFFSET 7000000
'4 ms
SELECT * FROM test ORDER BY RowID LIMIT 1000

'10 ms
SELECT * FROM test ORDER BY RowID LIMIT 1000 OFFSET 100000

'64 ms
SELECT * FROM test ORDER BY RowID LIMIT 1000 OFFSET 1000000

'406 ms
SELECT * FROM test ORDER BY RowID LIMIT 1000 OFFSET 7000000

If I could get my WORST case below 5 seconds I’d be a happy camper!

adding Order by dropped from 142.2 to 123.00 better but still TWO minutes!

Wow! How many columns?

Create Table million ( id int,
firstname text,
 lastname text,
  address text,
     city text,
    state text,
      zip text,
    phone text,
    email text,
   photo,
  taxable int)

Sorry I see now you had posted that up farther. I wonder if it’s your photo column that is slowing things down. I wonder what results you would get if you dropped that column?

good point… let me drop it and see

after dropping PHOTO… did make a HUGE difference… and one I can compensate for actually… since PHOTO was a blob… and I don’t actually display BLOB contents …

but these results are all SUB 5 seconds :slight_smile:

' 1.40 seconds - about the same
SELECT * FROM test LIMIT 1000

' 1.45 seconds ..   WAS 13.90
SELECT * FROM test LIMIT 1000 OFFSET 100000

' 1.18 seconds -- WAS 142.8
SELECT * FROM test LIMIT 1000 OFFSET 999999 (since I only had 1mill to start with)

now using ROWID … faster but not significantly so

' 0.03 seconds
SELECT * FROM test rowid<1000

' 1.35 seconds
SELECT * FROM test  where rowid>10000 and rowid<=11000

' 0.03 seconds
SELECT * FROM test where rowid>999999 and rowid<=1000000  (since I only had 1mill to start with)

:slight_smile:

I’m curious now, what the results would be if you explicitly don’t include the photo column in the query?

Such as:

SELECT id, firstname, lastname,  address, city, state, zip, phone, email, taxable FROM Million ...

Probably still slow due to the fact that the data is still in the table?

yup… still waiting for that to finish

234.5 seconds!!!

SELECT id, firstname, lastname,  address, city, state, zip, phone, email, taxable FROM Million LIMIT 1000 OFFSET 999999

but only 0.03 seconds for

SELECT id, firstname, lastname,  address, city, state, zip, phone, email, taxable FROM Million where rowid>= 999999

In the mean time I was doing my own tests (for my own future reference).

I’ve been using in memory databases a lot and was surprised to find them 2 to 3 times faster. I thought they were 10 time faster or so.

For my table a select * takes 13.43 seconds. Attaching to an in memory db and inserting all the records into an in memory db takes 5 seconds.

So if my assumption is correct, for long selects with user defined where/order by clauses you cut the query time by 1/3, and then serve up the ‘view’ from the newly created table using the rowid.

I have 3 scenarios that might happen… HAD been hoping to handle all of the the same way, but it seems that isn’t possible

  • The user types in a query… here I have NO control over what they say… so it gets executed AS-IS, and if it takes forever… so be it
  • The user just selects a table (mouse click, no SQL entered)… this is to be handled as we have been discussing above, and I will probably go for the ROWID method
  • The user just selects a VIEW… for this I have no choice but to use limit/offset as views have no rowid

For 1.

You should still be able to use the user define select statement to create an in memory temp table like this:

"CREATE TABLE temp AS " + UserSelectSQL

For that though you would need to be attached to the db and alter the select SQL to reference the on disk table reference.

For 2.

That would definitely be the fastest but would need to account for tables without rowid.

For 3.

I’ve never fooled with views, but should probably check it out.

Ok just tested scenario 3.

I created this view:

CREATE VIEW "view1" AS SELECT * from test where f1 like '%t%';

The view contains over 2M records and takes over 5 seconds to run.

SELECT * FROM view1

Running this to copy to an in memory temp table takes only 3.21667 seconds.

CREATE TABLE temp AS SELECT * FROM view1

Not only did the query run in about half the time, but now it can be loaded from the temp table using the rowid method.

For a table to NOT have a rowid, it has to be purposely created with then NOROWID constraint…
and IF someone did that , then I’d default to method #3

The problem I see with #1 is that they many have up to 10 databases already attached together, and I’d have to figure out which one (or combo) the data was in , to be able to attach back to a memory DB (which my app already has for “housekeeping” stuff)