Median Vs Average

There are god reasons to use the median rather than the average in apps that base calculations on measures of central tendency. I am currently working on an application that uses the average to calculate a factor to multiply against a constant in order for the application to move toward a better measure of the population, rather than the sample.
A bit obscure? The bottom line is that I would prefer to calculate the median and interquartile range (IQR), as this would eliminate the outliers from the calculations performed.

Does anyone have a simple method for calculating the median from the data in an SQLite database?

Any hints would be appreciated.
Phil

The median is simply the midpoint in a set of ordered values. So what I would probably do is something like:

  • Issue SELECT count(*) FROM table WHERE criteria
  • Get count of records which match your desired criteria
  • Compute midpoint, i.e. count/2
  • Issue SELECT value FROM table WHERE criteria ORDER BY sort field(s) OFFSET midpoint LIMIT 2

If the count is an odd number of records, you can use single actual record that is the midpoint. If the count is an event number of records, if I remember my school days math correctly you would take the values from the two “middle” records and compute the mean average of those two values. That is why I used LIMIT 2 above so you’d have both values available.

It has been over 40 years since I had any math classes, so I might remember wrong…

I haven’t tried this, but found it using Google in about 2 minutes

 SELECT x.Hours median
      FROM BulbLife x, BulbLife y
      GROUP BY x.Hours
      HAVING
         SUM(CASE WHEN y.Hours <= x.Hours
            THEN 1 ELSE 0 END)>=(COUNT(*)+1)/2 AND
         SUM(CASE WHEN y.Hours >= x.Hours
            THEN 1 ELSE 0 END)>=(COUNT(*)/2)+1

replace BULBLIFE with your table, and HOURS with the value you want the median of

EDIT… tested it , and it does seem to work

Thanks

I had to take a different approach, as I have a number of different categories of things all in the same table, so I copied each data type into a separate array and wrote the following function (Method)

It is called by passing the array as follows: Dim medianOfArray As Double = CalculateMedian(ArrrayToBePassed)

It also takes into account the special cases where there are nil, only one, or two values that are being calculated.

// CalculateMedian(ArrayToAnalyse() As Double)

Dim medianValue As Double

// first sort the array
ArrayToAnalyse.sort

// Get the number of entries in the array - one more than the index for ubound, as the array starts with a zero
Dim n As Integer = ArrayToAnalyse.Ubound + 1

// if n > -1 array has values
if n > 0 then
// Display array for testing purposes
'for i As integer = 0 to ArrayToAnalyse.Ubound
'MsgBox("Aarray value = " + Str(ArrayToAnalyse(i)))
'next

if n mod 2 = 0 then
MsgBox(“Array has even number of entries…” + Str(n))
// median is the average of the two middle values
if n = 2 then // special case
'msgBox(" The two values to average are " + Str(ArrayToAnalyse(0) ) + " and " + Str(ArrayToAnalyse(1) ))
medianValue = (ArrayToAnalyse(0) + ArrayToAnalyse(1) ) / 2

else // more than two entries
  'msgBox(" The two central values to average are " + Str(ArrayToAnalyse(n/2 - 1) ) + " and " + Str(ArrayToAnalyse(n/2) ))
  medianValue = (ArrayToAnalyse(n/2 - 1) + ArrayToAnalyse(n/2))/2
end if

else // an odd number of entries
'MsgBox(“Array has an odd number of entries…” + Str(n))
if n = 1 then /// special case
medianValue = ArrayToAnalyse(0)
else
medianValue = ArrayToAnalyse((n-1)/2)
end if
end
'MsgBox("Median value to be returned is " + Str(medianValue))
return medianValue
else

return 0.0 // default to zero if cannot calculate the median value

end if

================
I hope this helps someone else looking fo this feature. Pity it isn’t included in the array type viz: Array.median

There should be a way to extend this to give back the median, interquartile range and identify outliers…

Will have to think about it.

Phil