postgreSQL: Get extracted sum?

I thought I was so clever in optimizing this code, but I wasn’t:

My project connects to a postgres database with tables for patients and entries (named German Patienten and Eintraege), where entries contain events related to a patient that have a time span (during one day) and may have a currency value.

The previous code was running three rowsets to gather the information and took quite a long time to complete – rowsets 2 and 3 where built on each patient.id gathered from the first.

I thought I could shortcut this by gathering the sum in a connected select like here, but the sum(gewinn) will always be built from all the entries of that patient, not by those in the selected time span only like below:

select sum(gewinn) as gew, p.name, p.vorname, p.kategorie, e.patientid from eintraege e left join patienten p on e.patientid = p.id where e.patientid in (SELECT (patientid) AS pat FROM eintraege WHERE gewinn::numeric != 0 ) AND DATE(e.zeitbis) BETWEEN '2022-01-01' AND '2022-9-29' group by e.patientid, p.name, p.vorname, p.kategorie order by gew

This is where my SQL knowledge leaves me. Is there an elegant way of extracting the sum from entries of that selected time span only or do I have to deploy several Selects like before?

First, here is your query reformatted to be more readable:

select
  sum(gewinn) as gew,
  p.name,
  p.vorname,
  p.kategorie,
  e.patientid
from
  eintraege e
  left join patienten p on e.patientid = p.id
where
  e.patientid in (
    select
      (patientid) as pat
    from
      eintraege
    where
      gewinn::numeric != 0)
  and DATE(e.zeitbis) between '2022-01-01' and '2022-9-29'
group by
  e.patientid,
  p.name,
  p.vorname,
  p.kategorie
order by
  gew

I expect this query will yield something like:

10.0   Joe   xxx   cat   100
20.0   Jan   yyy   cat   200

What were you looking for instead?

Thanks, Kem. Unfortunately, that’s how far as I got. Which will bring the sum of ALL entries of the respective patient, not the sum of all entries in that selected time span only.

I don’t see how that’s possible. Your WHERE clause limits the patients to any id that have at least one record where gewinn != 0 and considers only the events within the given time frame.

Have you tried reducing the time frame to, say, a week to see if you get different results?

You can also break this up with WITH clauses to limit each set of data, something like:

with
patient_ids_with_gewinn as (
  select patientid
  from eintraege
  where gewinn::numeric > 0.0
),
limited_events as (
  select *
  from
    eintraege
  where
    e.zeitbeis::date between '2022-01-01' and '2022-09-29'
    and patientid in (select patientid from patient_ids_with_gewinn)
)
select
  sum(gewinn) as gew,
  p.name,
  p.vorname,
  p.kategorie,
  e.patientid
from
  limited_events as e
  join patienten p on e.patientid = p.id
group by
  e.patientid,
  p.name,
  p.vorname,
  p.kategorie
order by
  gew
3 Likes

Excellent, Kem! Thanks a lot! That works as expected!

2 Likes

Hi Ulli!
Also this probably works (sorry, too lazy to set up a proper test case):
select
sum(gewinn) FILTER (WHERE e.zeitbeis::date between ‘2022-01-01’ and ‘2022-09-29’) as gew,
p.name,
p.vorname,
p.kategorie
from
limited_events as e
join patienten p on e.patientid = p.id

3 Likes