Tricky SQL Help

I have a table in my database, we’ll call it “Visits”. In my visits table, I have these columns:
uuid
visitor_uuid
action
datetime_created

Now let’s say that various things that a visitor can do will dump data into this table, essentially recording a “visit”. Here is what I need to do: Find all the people who have visited at least 5 times in the last 30 days.

On the surface, this seems simple enough - however, multiple entries on the same day should be ignored, and only counted as 1 visit. So, if there are 3 visits all on one day, that should only count as 1 visit in my last 30 days for that visitor_uuid.

I’m not exactly a rookie with sql, but I can’t think of any clean way to do this all in a single query, or even with a series of nested subqueries… The best I can think of is mostly a brute force approach, where I just ask for any visitor_uuid that has more than 5 entries in the table in the last 30 days, then for each of those, pull their whole visit history and compare dates of each visit one at a time to see if there are at least 5 that occur on different days.

Is there a better way?

Without being in a position to try at the moment…

It seems like you first want to query for DISTINCT visits by each user per date. That should get you past the “x times in a single day still counts as 1” problem.

Then you can count those visits within the period and select the ones with at least 5.

This seems totally doable.

Hmm… I’m not sure DISTINCT will help here - since we could have a visit at noon, 2 pm, 8 pm, and 10pm all on the same day. Those would be distinct entries… unless there is some magical syntax for DISTINCT that lets me put a relative range on things…

would seem that this should be doable in one select if you can collapse multiple visits per day into one distinct row for that user
then you just count the rows for that user that are in the date range

pseudo sql

select visitor_uuid
from visits a
where visitor_uuid in ( select distinct visitor_uuid where datetime_created between today - 30 and today )
and 4 < ( select count(1)
from visitor b
where b.visitor_uuid = a.visitor_uuid
and b.visitor_uuid || “" || date(b. datetime_created) in (
select distinct visitor_uuid || "
” || date(datetime_created) from visits datetime_created between today - 30 and today ) ) )

or something like this :stuck_out_tongue:

the last part should give me one row per visitor for the last 30 days

and then I can count them up

hmmmm … what does something along these lines yield ?

select visitor_uuid, count()
from visits
where datetime_created between today - 30 and today
group by visitor_uuid || "
" || date(datetime_created)

you might get away with a simple group by / having

EDIT : err wait that will count every visit during a day … damn

I’ve used the script in the first post here to limit requests in some projects.
The numRecentVisits() function lets you check visits by ip address by time.

This script is PHP and MySQL, but probably could be adapted to whatever your use may be.

It might help to extract the date portion of your datetime column into its own (pseudo) column. Then select visior_uuid, date, count(*) group by visitor_uuid by date into a temporary table and run your query off of that.

In sqlite :
select distinct visior_uuid,count(*) from (SELECT distinct visior_uuid,date(datetime_created) as day FROM table where datetime_created between date(‘now’,’-30 day’) and date(‘now’)) group by 1

Yeah, as I was reading through it I saw you fell into the same trap I did when pondering this.

[quote=166583:@Tim Parnell]I’ve used the script in the first post here to limit requests in some projects.
The numRecentVisits() function lets you check visits by ip address by time.[/quote]

Unfortunately, that is doing the opposite of what I need - it shows the total number of visits over a timespan (i.e., 7 in one day), whereas I need to know the total number of individual days on which a visit occurred.

Oh the way I read your request you only needed to see if they did visit more than 5 times in 30 days :slight_smile:

However the first one should only count 1 entry per day for any visitor

And, not knowing what db you’re using there are some other things you might be able to do depending on the DB

@Norman Palardy - I’m using the built-in SQLite engine. After pondering this some more, I think I should be able to feed the contents of the datetime_created column into the date() function and tell it to only show the date rather than the full time stamp, then group by those results.

I’m on my phone now so I won’t bother with trying to type up an example of what I mean, but I’ll try it once I’m back at my computer and share a solution of it works.

Thank you everyone for the suggestions.

you can do this with a temp table or sub query.

lets try temp table.

step 1) dump initial data into temp table. We will record only one occurrence for visitor per day for the last 30 days… however if you want to know how many visits in a day… you have that as well.

in addition, you have to remove time from the date since we are after date only.

select convert(char(8), datetime_created, 112) as [visit_day], visitor_uuid, count(1) as [vistor_day_count]
into #T_VISIT_TMP
from Visits
where datetime_created between (GETDATE() - 30) and GETDATE()
group by convert(char(8), datetime_created, 112), visitor_uuid

step 2) query temp table for visitors that have visited in last 30 days 5 times or more.

select visitor_uuid, count(1) as [visit_count_in_last_30_days]
from #T_VISIT_TMP
group by visitor_uuid having count(1) >= 5
order by 2 desc;

[quote=166627:@Kimball Larsen]@Norman Palardy - I’m using the built-in SQLite engine. After pondering this some more, I think I should be able to feed the contents of the datetime_created column into the date() function and tell it to only show the date rather than the full time stamp, then group by those results.
[/quote]
Thats kind of what I’m thinking
But you still need only one row per date per user
Thats why my initial query had the last select distinct for the case where there were lots of visits by one user on one day

This seems to work for me. What do you think?

SELECT DISTINCT visitor_uuid 
FROM visit AS a 
WHERE 
	( 
		SELECT count ( DISTINCT DATE( datetime_created ) ) FROM visit AS b 
		WHERE a.visitor_uuid = b.visitor_uuid 
	) >= 5 AND
	DATE ( datetime_created ) >= DATE ( 'now', '-30 day' )

No, that stinks. Working on it…

This is better:

SELECT DISTINCT visitor_uuid 
FROM visit AS a 
WHERE 
	( 
		SELECT count ( DISTINCT DATE ( datetime_created ) ) 
		FROM visit AS b 
		WHERE a.visitor_uuid = b.visitor_uuid  AND
		DATE ( datetime_created ) >= DATE ( 'now', '-30 day' )
	) >= 5 

yeah @Kem Tekinay - your first one up there was both horribly inefficient and returning the wrong results (which is the best I could do so far as well… :slight_smile: )

I’ll try your next attempt right now.

Your most recent attempt is returning correct results! But it takes a LONG time to execute (I have about 3700 rows in my test table, and this query takes nearly 5 seconds to run on my new macbook pro). I suspect that the nested subquery is the performance killer here.

For small sets of data, your approach wins.

For my specific use-case, I think I’ll have to add a column to the table that will flag whether the visit is the first for that calendar date, and set it correctly when adding a new row by first checking to see if there are any other rows for that visitor that day. The population of this table can be relatively slow and expensive. The generation of the reports this data supports needs to be lickety-split instantaneous.