Tricky SQL Help

Here’s one I’ve just come up with that seems to be working correctly, and runs in just a few milliseconds on my machine with the same set of test data:

select count(distinct(date(datetime_created))) as theCount, visitor_uuid from visit where date(datetime_created) >= date('now', '-30 day') group by visitor_uuid having theCount > 4 order by theCount desc

Does anyone see anything obviously wrong with it? This data is pretty hard to know for sure if I’m getting exactly the right results.

My first thought is to use a Common Table Expression with a GROUP BY and HAVING clause.

I think it would look something like this:

WITH loginCTE (visitor_uuid,logindate) AS (
SELECT DISTINCT visitor_uuid,SUBSTR(datetime_created,1,10)
FROM Visits
WHERE datetime_created BETWEEN DATE(‘now’,’-30 day’) AND DATE(‘now’)
),
SELECT visitor_uuid
FROM loginCTE
GROUP BY visitor_uuid,logindate
HAVING COUNT(*)>4

[quote=166681:@Kimball Larsen]Here’s one I’ve just come up with that seems to be working correctly, and runs in just a few milliseconds on my machine with the same set of test data:

select count(distinct(date(datetime_created))) as theCount, visitor_uuid from visit where date(datetime_created) >= date('now', '-30 day') group by visitor_uuid having theCount > 4 order by theCount desc

Does anyone see anything obviously wrong with it? This data is pretty hard to know for sure if I’m getting exactly the right results.[/quote]

Not that I can spot and this is about where I was heading with what I was trying
small set of sample data seems to be fine
I have one user that visited 8 times on one day and one user who visited 8 times over 8 days
this returns that one user that has 8 visits over 8 days and not the other

the CTE would probably also work but I dont know what version of sqlite that first appeared in and the one installed on my macbook pro doesn’t support it :frowning:

EDIT : looks like 3.8.6 is where CTEs appeared

Thanks for the testing, Norman. Further testing with more sample data on my mac has increased my confidence level that my solution is accurate and lickety-split fast.

Thanks for all the ideas, everyone!

I wonder if it wouldn’t be more efficient to have a visit_date column that is filled in by a trigger and run the query on that instead.

I was thinking along those same lines @Kem Tekinay, but my current query runs in just a few milliseconds, which is plenty fast for what I need. I’m actually pulling 5 different kinds of metrics from this data, which is involving (I think) 9 total queries that all follow the same pattern I outlined… stuff like people who have visited N times in the last 30 days, or m-n times, or only 1 time in the last 30, but multiple times in the prior 45, or at least 1 time in each of the last 30, 60, and 90 day time periods, etc.

I’ve benchmarked calculation of these stats, and it looks like pulling them all only takes (on average) 0.030 seconds (30 milliseconds). That’s plenty fast.

Throwing a visit_date in the table and cleaning up the queries would undoubtedly speed things up some, but the difference between 0.030 and 0.010 total seconds of calculation time in this context is not worth the additional brain damage.