Inaccurate average

Dawn_McCoy · July 24, 2019, 6:20pm

I have a program that is tracking cycles of an industrial machine. I need to re-calculate the average after every cycle. Not hard math, however, as cycles add up, the average gets further and further out of whack (Excel or a calculator has a result of 19.52 and my Xojo code has 19.56). At some point, my code stops calculating the average. I do not have error handling skills in Xojo despite reading the documentation multiple times so forgive me if I’m doing it wrong. Below is my code. The commented lines is the first way I tried it. Both have the same issue. Any idea what I’m missing?
Thanks in advance!

CalcAve(ElapsedSecs as Double, CurrentAve as Double, CycleCount as Double) As Double
Dim NewCalcAve As Double

Try
  If CycleCount < 1 Then
    'NewCalcAve  = (Round(ElapsedSecs*100))/100
    NewCalcAve  = ElapsedSecs
  else
    'NewCalcAve = (Round((ElapsedSecs + (CurrentAve * CycleCount)) / (CycleCount + 1)*100))/100
    NewCalcAve = (ElapsedSecs + (CurrentAve * CycleCount)) / (CycleCount + 1)
  End If
  NewCalcAve = CDbl(Format(NewCalcAve, "#.00"))
  WriteLog(Str(NewCalcAve), Str(ElapsedSecs))
Catch e As IOException
  MsgBox("File IO Error DA1: " + e.Reason)
End Try
Return NewCalcAve

Jeff_Tullin · July 24, 2019, 6:38pm

Isn’t the average (sum of all values) / (count of values)?

I’m not sure where CurrentAve is being set (global?) but it seems like you are doing this:

for this set:
23,5,4,6,7, 9

(hope I get this right…)

Your first average is
((23 *1 )+5 )/2 = 14

next one is
((14 *2) +4 )/ 3 = 10.6

next one is
((10.6 * 3) +7 )/ 4 = 9.575

but the average is actually

(23+5+4+6+7+9) / 6 = 9

Norman_Palardy · July 24, 2019, 7:01pm

I think you did miss some values

for this set:
23,5,4,6,7, 9

I think, in general the average is,
( ( * ) + ) /

Your first average is
((0 * 0) + 23 ) / 1 => 23

next one is
((23 *1 ) + 5 ) / 2 => 14

next one is
((14 *2 ) + 4 ) / 3 => 10.6

next one is
((10.6 *3 ) + 6 ) / 4 => 9.5

next one is
((9.45 *4 ) + 7 ) / 5 => 9.00000000000002

next one is
((9.00000000000002 *5 ) + 9 ) / 6 => 9.00000000000002

which jives pretty closely with

(23+5+4+6+7+9) / 6 = 9

Jeff_Tullin · July 24, 2019, 7:09pm

Interesting. Cheers Norman.
Moral: don’t do the math in your head…

Nonetheless, where I was going was : is multiplying the rolling average by the countsofar a recognised way of doing this?

The Round() or the CDbl(Format(NewCalcAve, “#.00”)) might lead to a progressive variance as anything beyond 2 dp is being lost on each pass.
My mental arithmetic is definitely not up to proving if that loss is enough to account for it.

Norman_Palardy · July 24, 2019, 7:19pm

I suppose IF you had unlimited precision numerics then computing the “sum so far” as the current average * count so far should be ok but IANAM (I am not a mathematician) and could not prove this if required

However, I have used this technique before and have not run into significant errors

That said, applying round, ceil, or other functions like that, may most certainly affect the result

From what I can see Dawn is using much the same algorithm I’m suggesting

LangueR · July 24, 2019, 8:01pm

Algebraically it should work as Norman stated. But this section on Doug’s code will little by little cause his equation to diverge.

NewCalcAve = CDbl(Format(NewCalcAve, "#.00")) WriteLog(Str(NewCalcAve), Str(ElapsedSecs))
He should either create another variable for displaying and not touch the variable the calculations are running on, or just format the display w/o changing the variable.

Dawn_McCoy · July 24, 2019, 8:07pm

I was trying to avoid adding all the cycles as the count can climb quickly. I will change my approach and keep a running sum to see if the variance clears up. Thanks all!

LangueR · July 24, 2019, 8:23pm

I haven’t tried this, but I think if you make the following change you should get what you are expecting (within the numerical precision of the variables being used).

change

NewCalcAve = CDbl(Format(NewCalcAve, "#.00")) WriteLog(Str(NewCalcAve), Str(ElapsedSecs))
to:

WriteLog(Str(NewCalcAve, "#.00"), Str(ElapsedSecs))

Markus_Winter · July 24, 2019, 9:03pm

UInt64?

DaveS · July 24, 2019, 9:28pm

best way that I can think of… (and this is what you might have mentioned just now)

a) keep a running sum of the values as they come in
b) keep a running count of the number of values

c) your running average is a/b

Markus_Winter · July 25, 2019, 1:17am

[quote=446537:@Norman Palardy]I think, in general the average is,
( ( * ) + ) / [/quote]

That depends on what count we are talking about.

Your algorithm is from the point of view of the original count:

( ( prevAvg * prev# ) + newVal ) / prev# + 1

or as prevAvg * prev# = previous Sum

and

prev# + 1 = new#

(previous sum + new value ) / new#

But I suspect the problem is in the cycleCount that is fed in as a parameter - that Dawns uses that algorithm from the point of view of the NEW count as AFTER the count the new average is calculated. So the cycle count would be off in the calculation.

NewCalcAve = (ElapsedSecs + (CurrentAve * CycleCount)) / (CycleCount + 1)

should then be

NewCalcAve = (ElapsedSecs + (CurrentAve * CycleCount - 1)) / (CycleCount)

// and that naming is confusing anyway - don’t use current, use previous and new

Tim_Hare · July 25, 2019, 5:06am

The original code was solid. It’s just that the inherent inaccuracy in a Double representation causes the result to drift over time. As Norman said, with unlimited precision, it would yield the correct result. A Double doesn’t have that kind of precision.

Markus_Winter · July 26, 2019, 1:21am

That isnt in question and is thoroughly beside the point.

We dont know that this is the issue. And I simply offered an alternative possibility: If a method is called with wrong parameters then you get wrong results. Dismissing other possible error sources seems unwise at this stage.

So unless Dawn supplies a few hard numbers we cant check and determine where in her app the problem is.

She might also try Bob Delaneys maths plugin.

LangueR · July 26, 2019, 7:46am

The code algorithm is good indeed, but the truncation Dawn is doing after the algorithm will cause errors way before any Double limitation. This following line is messing up the whole algorithm:

NewCalcAve = CDbl(Format(NewCalcAve, "#.00"))

One should never alter the variable the algorithm is using for display purposes.

Markus_Winter · July 26, 2019, 12:09pm

[quote=446743:@LangueR]The code algorithm is good indeed, but the truncation Dawn is doing after the algorithm will cause errors way before any Double limitation. This following line is messing up the whole algorithm:

NewCalcAve = CDbl(Format(NewCalcAve, "#.00"))

One should never alter the variable the algorithm is using for display purposes.[/quote]

Nicely spotted!