SQL queries: Is this the speed one should expect?

SQLIte database with two tables: tblProtein with 4533 proteins, tblPeptide with 229327 peptides

These are the SQL queries with the times:

SELECT * FROM tblProtein: 0,188 Seconds
SELECT SUM( Length ) FROM tblProtein: 0,065 Seconds
SELECT * FROM tblPeptide: 8,138 Seconds
SELECT MIN(m_over_z) AS Smallest, Max(m_over_z) AS Largest FROM tblPeptide: 1,997 Seconds
SELECT MIN(m_over_z_withFixedTags) AS Smallest, Max(m_over_z_withFixedTags) AS Largest FROM tblPeptide: 0,234 Seconds
SELECT * FROM tblPeptide WHERE m_over_z < 300: 1,108 Seconds
SELECT * FROM tblPeptide WHERE m_over_z BETWEEN 300 AND 1500: 6,803 Seconds
SELECT * FROM tblPeptide WHERE m_over_z > 1500: 1,942 Seconds
SELECT * FROM tblPeptide WHERE m_over_z_withFixedTags < 300: 0,388 Seconds
SELECT * FROM tblPeptide WHERE m_over_z_withFixedTags BETWEEN 300 AND 1500: 5,291 Seconds
SELECT * FROM tblPeptide WHERE m_over_z_withFixedTags > 1500: 4,889 Seconds
SELECT * FROM tblPeptide WHERE IdentifiedInExperiment = 1 AND m_over_z NOT BETWEEN 300 AND 1500: 0,829 Seconds
SELECT * FROM tblPeptide WHERE IdentifiedInExperiment = 1 AND m_over_z_withFixedTags NOT BETWEEN 300 AND 1500: 0,215 Seconds
SELECT * FROM tblPeptide WHERE NumberOfMissedCleavages = 0: 2,678 Seconds
SELECT * FROM tblPeptide WHERE NumberOfMissedCleavages = 0 AND IdentifiedInExperiment = 1: 0,844 Seconds
SELECT * FROM tblProtein WHERE NumberOfUniquePeptidesInDigest > 0: 0,167 Seconds
SELECT DISTINCT Protein_ID FROM tblProtein WHERE NumberOfUniquePeptidesInDigest > 0 AND Protein_ID IN (SELECT Protein_ID FROM tblPeptide WHERE m_over_z BETWEEN 300 AND 1500 ): 0,478 Seconds
SELECT DISTINCT Protein_ID FROM tblProtein WHERE NumberOfUniquePeptidesInDigest > 0 AND Protein_ID IN (SELECT Protein_ID FROM tblPeptide WHERE m_over_z_withFixedTags BETWEEN 300 AND 1500 ): 0,371 Seconds
SELECT DISTINCT Protein_ID FROM tblPeptide WHERE m_over_z NOT BETWEEN 300 AND 1500: 0,238 Seconds
SELECT DISTINCT Protein_ID FROM tblPeptide WHERE Protein_ID NOT IN ( SELECT DISTINCT Protein_ID from tblPeptide WHERE m_over_z NOT BETWEEN 300 AND 1500): 0,600 Seconds

Seems slow to me … especially the ones ike

SELECT * FROM tblPeptide WHERE m_over_z BETWEEN 300 AND 1500: 6,803 Seconds

are you using any indexing on the tables?

no

Stating times and asking if they are reasonable is in some cases like asking “Do I like blue?” or “How long is a rope”

These thing all affect the response time

a) How big is the table (unindexed requires a full table scan, resulting in every record being evaluated)
b) Is it indexed by the appropriate values in relationship to the query?
c) Is the distribution of the data within the table such that the DB Engine will/can utilize the index (assuming there is one)
d) how does the DB Engine optimize the query (BETWEEN vs [m_over_z>=300 and m_over_z<=500]) [not saying which is better here]
e) the hardware (CPU/Disk Drive) a local drive is faster than an external is faster than a remote

So if you have two different queries that are meant to return the same results and there is a huge difference is speed, use the EXPLAIN PLAN feature to determine what optimizations (if any) are being used, and how altering or adding an index might help

It also helps to periodically run vacuum and analyze commands to do some cleanup.

[quote=132275:@Markus Winter]SQLIte database with two tables: tblProtein with 4533 proteins, tblPeptide with 229327 peptides

These are the SQL queries with the times:

SELECT * FROM tblProtein: 0,188 Seconds
SELECT SUM( Length ) FROM tblProtein: 0,065 Seconds
SELECT * FROM tblPeptide: 8,138 Seconds
SELECT MIN(m_over_z) AS Smallest, Max(m_over_z) AS Largest FROM tblPeptide: 1,997 Seconds
SELECT MIN(m_over_z_withFixedTags) AS Smallest, Max(m_over_z_withFixedTags) AS Largest FROM tblPeptide: 0,234 Seconds
SELECT * FROM tblPeptide WHERE m_over_z < 300: 1,108 Seconds
SELECT * FROM tblPeptide WHERE m_over_z BETWEEN 300 AND 1500: 6,803 Seconds
SELECT * FROM tblPeptide WHERE m_over_z > 1500: 1,942 Seconds
SELECT * FROM tblPeptide WHERE m_over_z_withFixedTags < 300: 0,388 Seconds
SELECT * FROM tblPeptide WHERE m_over_z_withFixedTags BETWEEN 300 AND 1500: 5,291 Seconds
SELECT * FROM tblPeptide WHERE m_over_z_withFixedTags > 1500: 4,889 Seconds
SELECT * FROM tblPeptide WHERE IdentifiedInExperiment = 1 AND m_over_z NOT BETWEEN 300 AND 1500: 0,829 Seconds
SELECT * FROM tblPeptide WHERE IdentifiedInExperiment = 1 AND m_over_z_withFixedTags NOT BETWEEN 300 AND 1500: 0,215 Seconds
SELECT * FROM tblPeptide WHERE NumberOfMissedCleavages = 0: 2,678 Seconds
SELECT * FROM tblPeptide WHERE NumberOfMissedCleavages = 0 AND IdentifiedInExperiment = 1: 0,844 Seconds
SELECT * FROM tblProtein WHERE NumberOfUniquePeptidesInDigest > 0: 0,167 Seconds
SELECT DISTINCT Protein_ID FROM tblProtein WHERE NumberOfUniquePeptidesInDigest > 0 AND Protein_ID IN (SELECT Protein_ID FROM tblPeptide WHERE m_over_z BETWEEN 300 AND 1500 ): 0,478 Seconds
SELECT DISTINCT Protein_ID FROM tblProtein WHERE NumberOfUniquePeptidesInDigest > 0 AND Protein_ID IN (SELECT Protein_ID FROM tblPeptide WHERE m_over_z_withFixedTags BETWEEN 300 AND 1500 ): 0,371 Seconds
SELECT DISTINCT Protein_ID FROM tblPeptide WHERE m_over_z NOT BETWEEN 300 AND 1500: 0,238 Seconds
SELECT DISTINCT Protein_ID FROM tblPeptide WHERE Protein_ID NOT IN ( SELECT DISTINCT Protein_ID from tblPeptide WHERE m_over_z NOT BETWEEN 300 AND 1500): 0,600 Seconds

Seems slow to me … especially the ones ike

SELECT * FROM tblPeptide WHERE m_over_z BETWEEN 300 AND 1500: 6,803 Seconds[/quote]

Looks like a few indexes would be in order

SELECT * FROM tblPeptide WHERE m_over_z < 300: 1,108 Seconds SELECT * FROM tblPeptide WHERE m_over_z BETWEEN 300 AND 1500: 6,803 Seconds SELECT * FROM tblPeptide WHERE m_over_z > 1500: 1,942 Seconds SELECT * FROM tblPeptide WHERE m_over_z_withFixedTags < 300: 0,388 Seconds SELECT * FROM tblPeptide WHERE m_over_z_withFixedTags BETWEEN 300 AND 1500: 5,291 Seconds SELECT * FROM tblPeptide WHERE m_over_z_withFixedTags > 1500: 4,889 Seconds
Run these with EXPLAIN
And I bet you see a complete table scans since there is no useful index

I’d at least add an index on m_over_z and m_over_z_withFixedTags so the query optimizer can find the starting point fast then just walk through the results instead of starting at the beginning of the data & walking through all of it
http://sqlite.org/optoverview.html