Difficult SQL query

  1. 7 days ago

    I am trying to calculate occupancy in a hotel for every month in a one year period. That means scanning thru thousands of booking records.

    I know how to do this in pure Xojo code, and I already did it. (maybe not the most efficient way, iterating thru the records, but it works fine)
    I am wondering now, if it is possible to do it in pure SQLite, to make it even faster. But I am lost to how the query should be... (probably using a recursive query ?)

    The first issue is that the scheme of the "bookings" table (I'm simplifying here) is BookingId integer, DateIn date, DateOut date, so I don't have A ROW for every date but instead I have a field for the dateIN and a field for the dateOut.

    I know how to calculate the difference (in days) between two dates, but the problem here is that (obviously) the time span (dateIN-dateOut) of every booking may exceed one month even two or three... so I need to sum the right amount of nights for every month in the whole booking period of every record.

    Finally (as I did it in xojo) what I would like is to have (in the best scenario) a sort of dictionary YearMonth:NightsTaken..

    This is way more than I can handle with SQL, but is it even possible ?

    Ok, here's my workaround...It's fast enough (200 msec) for a 1 month period and it has a drawback: It only works to calculate on a monthly basis. The time span can only be one month long. (I may have to repeat it 12 times get a full year of data, but that's ok for now), and of course it requires some string preparation within xojo... In the following example I am calcultaing the totalstays in the month of may 2019. (This query is a draft... it may need some tweaking... but it works)

    select *, sum(cantnochesestemes) as TotalStays from (select reservas.fechain, reservas.fechaout, 
    case 
    
    when strftime('%m', reservas.fechain) < '05' AND strftime('%m', reservas.fechaout) > '05' then
    cast(strftime('%d', date('2019-05-01','start of month','+1 month','-1 day')) as integer)
    
    when strftime('%m', reservas.fechain) = '05'  AND (strftime('%m', reservas.fechain)  < strftime('%m', reservas.fechaout)) then
    cast(julianday(date(reservas.fechain,'start of month','+1 month','-1 day')) - julianday(reservas.fechain) +1 as integer)
    
    when strftime('%m', reservas.fechain) < '05' AND strftime('%m', reservas.fechain) < '06'  then
    cast (strftime('%d', date(reservas.fechaout,'start of month', '+'||strftime('%d', reservas.fechaout)||' day')) as integer)-2
    
    else 
    cast (julianday(reservas.fechaout)-julianday(reservas.fechain) as integer)
    end
    
    AS StaysThisMonth
    from reservas 
    where status <> 'Cancelada' AND ifnull(complimentary, 0) <> 1 
    and fechaout > '2019-05-01' AND fechain < '2019-06-01')
  2. Jeff T

    Aug 12 Pre-Release Testers Midlands of England, Europe

    Do you have control of the database, or are you just reporting from it?

    If you have complete control over it:
    I would suggest materialising the data at the point that each booking is made, or adjusted.
    You save the original with date from/date to
    But also save into a detailed table, one row per calendar date, linked to the original.
    If the booking is changed or deleted, delete the details too.

    In addition, have a table that contains one row for each calendar day.

    At report time, join your calendar table to the detail table using an outer join, so you get every date, plus <some records>
    The dates can be filtered , and bookings that span months will work fine because you have them broken down into day sized chuncks.

    Essentially, do the work at the point that the bookings are saved, and the report will be easy later on.

  3. 6 days ago

    Dave S

    Aug 13 San Diego, California USA

    Post the database schema, this can probably be done without the wasted overhead of a record per day unless that is required for another reason

  4. Kem T

    Aug 13 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut

    FYI, in PostgreSQL, you could fake this with a lateral join to the generate_series function, but I don't think SQLite has anything like that. I defer to the expertise of others here.

  5. Kem T

    Aug 13 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut

    This is some test code in PostgreSQL. I don't know if these are exactly the results you'd be looking for, but it offers a starting point and may provide some ideas on how to do it in SQLite:

    begin;
    
    create temp table tmp_booking (id bigserial, date_in date, date_out date);
    insert into tmp_booking (date_in, date_out) values
      ('2019-01-01', '2019-01-15'),
      ('2019-01-14', '2019-02-01');
    
    select
      year_month,
      count(year_month) as monthly_total
    from
      tmp_booking as b,
      lateral (
        select 
          concat(extract(year from d)::text, '-', trim(to_char(extract(month from d), '00'))) as year_month 
        from 
          generate_series(b.date_in::timestamp, b.date_out::timestamp, '1d'::interval) as d
      ) as b_date 
    group by
      year_month
    order by
      year_month;
    
    rollback;

    The results:

    year_month	monthly_total
    2019-01	          33
    2019-02	           1
  6. nicolás c

    Aug 13 Pre-Release Testers, Xojo Pro argentina

    just my 2 cents
    https://dbfiddle.uk/?rdbms=postgres_11&fiddle=d40c8bb1d37039242cff9a0dbaf6fec8

  7. Edited 6 days ago

    Thanks everybody for your input,

    Unfortunately SQLite (CubeSQL) lacks lateral joins or generate_series function that Postgres has.... :-(

    @Dave S : The 'bookings' table is a pretty big table but regarding the beginning and ending period it has only what I posted (Period_IN/Period_OUT). The rest of the information is useless.

  8. Richard D

    Aug 13 Pre-Release Testers, Xojo Pro Europe (UK, London)

    What about using CTE or common table expression??? i know you can do do recursive.
    it is available on sqlite > 3.8.2 or something and latest cubesql allow you to use CTE

    https://sqlite.org/lang_with.html

  9. @Richard D Yep... this is that direction I am heading now... but the SQL query I could get to work is too slow (3.5 secs to check a 15 days time span) , because has to check every row of the booking table if tha date is within its period boundaries. Here's the experimental query

    where
    '2019-03-15' is the beginning of the period to check
    '2019-04-01' is the end of the period to check
    RESERVAS.fechain = the start date of the booking being checked
    RESERVAS.fechaout = the end date of the booking being checked

    WITH RECURSIVE
      cnt(x) AS (
         SELECT 0
         UNION ALL
         SELECT x+1 FROM cnt
          LIMIT (SELECT ((julianday('2019-04-01') - julianday('2019-03-15'))) + 1)
      )
    SELECT date(julianday('2019-03-15'), '+' || x || ' days') as EachDate, count(*) FROM cnt
    left join RESERVAS on (RESERVAS.fechain <= EachDate and RESERVAS.fechaout > EachDate )
    group by EachDate
  10. Richard D

    Aug 13 Pre-Release Testers, Xojo Pro Europe (UK, London)

    i have not use the recursive CTE yet but i have been using for for combining multiple query into a double CTE

  11. Roman V

    Aug 13 Answer

    Ok, here's my workaround...It's fast enough (200 msec) for a 1 month period and it has a drawback: It only works to calculate on a monthly basis. The time span can only be one month long. (I may have to repeat it 12 times get a full year of data, but that's ok for now), and of course it requires some string preparation within xojo... In the following example I am calcultaing the totalstays in the month of may 2019. (This query is a draft... it may need some tweaking... but it works)

    select *, sum(cantnochesestemes) as TotalStays from (select reservas.fechain, reservas.fechaout, 
    case 
    
    when strftime('%m', reservas.fechain) < '05' AND strftime('%m', reservas.fechaout) > '05' then
    cast(strftime('%d', date('2019-05-01','start of month','+1 month','-1 day')) as integer)
    
    when strftime('%m', reservas.fechain) = '05'  AND (strftime('%m', reservas.fechain)  < strftime('%m', reservas.fechaout)) then
    cast(julianday(date(reservas.fechain,'start of month','+1 month','-1 day')) - julianday(reservas.fechain) +1 as integer)
    
    when strftime('%m', reservas.fechain) < '05' AND strftime('%m', reservas.fechain) < '06'  then
    cast (strftime('%d', date(reservas.fechaout,'start of month', '+'||strftime('%d', reservas.fechaout)||' day')) as integer)-2
    
    else 
    cast (julianday(reservas.fechaout)-julianday(reservas.fechain) as integer)
    end
    
    AS StaysThisMonth
    from reservas 
    where status <> 'Cancelada' AND ifnull(complimentary, 0) <> 1 
    and fechaout > '2019-05-01' AND fechain < '2019-06-01')
  12. 4 days ago

    John A

    Aug 15 Pre-Release Testers Las Vegas, Nevada

    @Roman V Try this:

    WITH
     m (month) AS (
       SELECT DISTINCT strftime('%m', fechain) AS month
       FROM reservas
     ),
     d (first, last) AS (
       SELECT printf ('2019-%02u-01', month) AS first,
          date(printf ('2019-%02u-01', month), '+1 month', '-1 day') AS last
       FROM m
     ),
     y (yearmonth, mdays, first, last) AS (
       SELECT strftime('%Y-%m', first) AS yearmonth, 
          julianday(last) - julianday(first) + 1 AS mdays ,*
       FROM d
       WHERE yearmonth LIKE '2019%' -- <-- year to summarize
       ORDER BY first -- this cuts query time in half
     ) 
    SELECT yearmonth, sum(stay) as StaysThisMonth
    FROM (
       SELECT yearmonth, res_id, fechain, fechaout,
          CASE
             WHEN status='Cancelada' THEN 0
             WHEN fechaout <= last THEN julianday(min(fechaout, last)) - (julianday(max(fechain, first)))
             WHEN fechain >= first THEN julianday(min(fechaout, last)) - (julianday(max(fechain, first))) + 1
             ELSE mdays -- whole month
          END 
          AS stay
       FROM y, reservas r
       WHERE complimentary = 0 AND r.fechain <= y.last AND r.fechaout >= y.first
    )
    GROUP BY yearmonth;

    This was interesting. I've worked with calendars and schedules and I like SQL puzzles.
    I wasn't sure about crossover stays, so the 'day' is put in the starting month.
    Using a dummy table with 10000 somewhat random 'reservations' this query takes about 51ms.

or Sign Up to reply!