Difficult SQL query

I am trying to calculate occupancy in a hotel for every month in a one year period. That means scanning thru thousands of booking records.

I know how to do this in pure Xojo code, and I already did it. (maybe not the most efficient way, iterating thru the records, but it works fine)
I am wondering now, if it is possible to do it in pure SQLite, to make it even faster. But I am lost to how the query should be… (probably using a recursive query ?)

The first issue is that the scheme of the “bookings” table (I’m simplifying here) is BookingId integer, DateIn date, DateOut date, so I don’t have A ROW for every date but instead I have a field for the dateIN and a field for the dateOut.

I know how to calculate the difference (in days) between two dates, but the problem here is that (obviously) the time span (dateIN-dateOut) of every booking may exceed one month even two or three… so I need to sum the right amount of nights for every month in the whole booking period of every record.

Finally (as I did it in xojo) what I would like is to have (in the best scenario) a sort of dictionary YearMonth:NightsTaken…

This is way more than I can handle with SQL, but is it even possible ?

Do you have control of the database, or are you just reporting from it?

If you have complete control over it:
I would suggest materialising the data at the point that each booking is made, or adjusted.
You save the original with date from/date to
But also save into a detailed table, one row per calendar date, linked to the original.
If the booking is changed or deleted, delete the details too.

In addition, have a table that contains one row for each calendar day.

At report time, join your calendar table to the detail table using an outer join, so you get every date, plus
The dates can be filtered , and bookings that span months will work fine because you have them broken down into day sized chuncks.

Essentially, do the work at the point that the bookings are saved, and the report will be easy later on.

Post the database schema, this can probably be done without the wasted overhead of a record per day unless that is required for another reason

FYI, in PostgreSQL, you could fake this with a lateral join to the generate_series function, but I don’t think SQLite has anything like that. I defer to the expertise of others here.

This is some test code in PostgreSQL. I don’t know if these are exactly the results you’d be looking for, but it offers a starting point and may provide some ideas on how to do it in SQLite:

begin;

create temp table tmp_booking (id bigserial, date_in date, date_out date);
insert into tmp_booking (date_in, date_out) values
  ('2019-01-01', '2019-01-15'),
  ('2019-01-14', '2019-02-01');

select
  year_month,
  count(year_month) as monthly_total
from
  tmp_booking as b,
  lateral (
    select 
      concat(extract(year from d)::text, '-', trim(to_char(extract(month from d), '00'))) as year_month 
    from 
      generate_series(b.date_in::timestamp, b.date_out::timestamp, '1d'::interval) as d
  ) as b_date 
group by
  year_month
order by
  year_month;

rollback;

The results:

year_month	monthly_total
2019-01	          33
2019-02	           1

just my 2 cents
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=d40c8bb1d37039242cff9a0dbaf6fec8

Thanks everybody for your input,

Unfortunately SQLite (CubeSQL) lacks lateral joins or generate_series function that Postgres has… :frowning:

@Dave S : The ‘bookings’ table is a pretty big table but regarding the beginning and ending period it has only what I posted (Period_IN/Period_OUT). The rest of the information is useless.

What about using CTE or common table expression??? i know you can do do recursive.
it is available on sqlite > 3.8.2 or something and latest cubesql allow you to use CTE

https://sqlite.org/lang_with.html

@Richard Duke Yep… this is that direction I am heading now… but the SQL query I could get to work is too slow (3.5 secs to check a 15 days time span) , because has to check every row of the booking table if tha date is within its period boundaries. Here’s the experimental query

where
‘2019-03-15’ is the beginning of the period to check
‘2019-04-01’ is the end of the period to check
RESERVAS.fechain = the start date of the booking being checked
RESERVAS.fechaout = the end date of the booking being checked

WITH RECURSIVE cnt(x) AS ( SELECT 0 UNION ALL SELECT x+1 FROM cnt LIMIT (SELECT ((julianday('2019-04-01') - julianday('2019-03-15'))) + 1) ) SELECT date(julianday('2019-03-15'), '+' || x || ' days') as EachDate, count(*) FROM cnt left join RESERVAS on (RESERVAS.fechain <= EachDate and RESERVAS.fechaout > EachDate ) group by EachDate

i have not use the recursive CTE yet but i have been using for for combining multiple query into a double CTE

Ok, here’s my workaround…It’s fast enough (200 msec) for a 1 month period and it has a drawback: It only works to calculate on a monthly basis. The time span can only be one month long. (I may have to repeat it 12 times get a full year of data, but that’s ok for now), and of course it requires some string preparation within xojo… In the following example I am calcultaing the totalstays in the month of may 2019. (This query is a draft… it may need some tweaking… but it works)

[code]select *, sum(cantnochesestemes) as TotalStays from (select reservas.fechain, reservas.fechaout,
case

when strftime(’%m’, reservas.fechain) < ‘05’ AND strftime(’%m’, reservas.fechaout) > ‘05’ then
cast(strftime(’%d’, date(‘2019-05-01’,‘start of month’,’+1 month’,’-1 day’)) as integer)

when strftime(’%m’, reservas.fechain) = ‘05’ AND (strftime(’%m’, reservas.fechain) < strftime(’%m’, reservas.fechaout)) then
cast(julianday(date(reservas.fechain,‘start of month’,’+1 month’,’-1 day’)) - julianday(reservas.fechain) +1 as integer)

when strftime(’%m’, reservas.fechain) < ‘05’ AND strftime(’%m’, reservas.fechain) < ‘06’ then
cast (strftime(’%d’, date(reservas.fechaout,‘start of month’, ‘+’||strftime(’%d’, reservas.fechaout)||’ day’)) as integer)-2

else
cast (julianday(reservas.fechaout)-julianday(reservas.fechain) as integer)
end

AS StaysThisMonth
from reservas
where status <> ‘Cancelada’ AND ifnull(complimentary, 0) <> 1
and fechaout > ‘2019-05-01’ AND fechain < ‘2019-06-01’)[/code]

@Roman Varas Try this:

WITH m (month) AS ( SELECT DISTINCT strftime('%m', fechain) AS month FROM reservas ), d (first, last) AS ( SELECT printf ('2019-%02u-01', month) AS first, date(printf ('2019-%02u-01', month), '+1 month', '-1 day') AS last FROM m ), y (yearmonth, mdays, first, last) AS ( SELECT strftime('%Y-%m', first) AS yearmonth, julianday(last) - julianday(first) + 1 AS mdays ,* FROM d WHERE yearmonth LIKE '2019%' -- <-- year to summarize ORDER BY first -- this cuts query time in half ) SELECT yearmonth, sum(stay) as StaysThisMonth FROM ( SELECT yearmonth, res_id, fechain, fechaout, CASE WHEN status='Cancelada' THEN 0 WHEN fechaout <= last THEN julianday(min(fechaout, last)) - (julianday(max(fechain, first))) WHEN fechain >= first THEN julianday(min(fechaout, last)) - (julianday(max(fechain, first))) + 1 ELSE mdays -- whole month END AS stay FROM y, reservas r WHERE complimentary = 0 AND r.fechain <= y.last AND r.fechaout >= y.first ) GROUP BY yearmonth;
This was interesting. I’ve worked with calendars and schedules and I like SQL puzzles.
I wasn’t sure about crossover stays, so the ‘day’ is put in the starting month.
Using a dummy table with 10000 somewhat random ‘reservations’ this query takes about 51ms.