Getting the average of numbers within a string

Hey guys this is what I’m trying to work out:

The data is being delivered at 100 samples per second from an Arduino Uno. I’m using a timer and in the action event use tempBuffer = Serial.ReadAll(). This part works pefectly fine. Now what I need to do is take samples of the data from the string (tempBuffer) and get the average in numeric format of course.

The data looks numeric but of course it not, it a long string. The data is terminated by CR+LF which is what the Adunion ouputs.

eg.

25 <CR+LF>
25 <CR+LF>
25 <CR+LF>
25 <CR+LF>
25 <CR+LF>
25 <CR+LF>
25 <CR+LF>
25 <CR+LF>
25 <CR+LF>
. . . etc all the way up to 100 samples (lines).

The variation of numbers is due to a sensitive load cell being attached to the Arduino.

What I need to do is take a sample of the data and show a consistant number on the screen maybe every second - this number will be used for calibration. Time is not crucial in this instance. It wouldn’t matter if the calculations took 1/10th of a second - but I’m sure it won’t.

For the basic Algorithm, what I’m thinking is to take 50 samples out of the one hundred, perhaps between 25-75 using the <CR+LF> as the delimiter. Add those 50 together then divide by 50 removing any decimal points, this would show the average.

In actual fact it wouldn’t matter if the numbers were just viewed as fruit, ie. Apples, Lemons, Grapes, Oranges providing that I knew which one appeared the most amount of times. Although I still need to use a similar method that does require numbers because of numeric calculations that will be required later.

I believe I have the Algorithm correct - I just don’t know how to convert it to Xojo language. Hope this makes sense. I have done lots reading but still getting nowhere.

Cheers, Steve.

using the Split function, transform the big string in a string array with only the numbers (still as string) in it
then made a loop for each item in the stringarray
convert each item to a number using the val function
and calculate the sum of these numbers during the loop
after the loop, divide by the number of items in the array.

Thanks Jean. I think I accidentally posted before I had finished or - something weird happened.

I think I understand what you have posted - I’ll investigate further.

[EDIT] I’ve edited this post, and Jean’s reply is still valid - I’ll leave this post as is and delete the other.

Steve, a small tip : to really delete a post and not simply the question, click on “Who can view this conversation” under the title, and enter your own name. The thread will become invisible to other members.

Thanks Michel, I think that has worked.

What time is it in France? it’s about 8.20pm here in Australia.

[quote=285128:@Steve K]Thanks Michel, I think that has worked.

What time is it in France? it’s about 8.20pm here in Australia.[/quote]

Yes, it has worked. Here, it is 12:26 PM :slight_smile:

This part confused me. Are you really looking for an average or the value that occurs most frequently?

Michel, Don’t you mean 12:46AM on Saturday morning, not PM?

If that’s not the case, then no wonder I’m having issues with my little algorithm :slight_smile:

Well the value that accurs most frequently will be the average or the mean.

Like I pointed, it wouldn’t matter if four fruit types were used. The range of fluctuation in the data is at most ever 4 and I have never seen the same number (fruit type) more than 3 to 4 in a row. If it was wider than that, then there would be bigger issues that would need solving, ie. electronics shielding etc.

So if you consider that, then it’s like you have 4 people positioned around a large funnel. Lets say they all have 25 pieces of the same fruit - Each one with Apples, the others with Lemons, Grapes, Oranges. (100 pieces in total). Via a beep sound every second, they throw their designated piece of fruit into the funnel and it falls into a thin long tube. What I want to do is cut the tube at some point in the middle and count the various fruits. There should be roughly the same amount of each.

I’m probably making this harder than need be :slight_smile:

[quote=285156:@Steve K]Michel, Don’t you mean 12:46AM on Saturday morning, not PM?

If that’s not the case, then no wonder I’m having issues with my little algorithm :)[/quote]

Nope. I did mean 26 minutes past noon, early Saturday afternoon. As right now is 2:28 PM, 14:28. And from what I can see 22:28 in Camberra.

http://www.timeanddate.com/worldclock/

Thanks for the clarification. I think I’d still count the values with a function like this:

Public Function MostFrequentValue(values() As String) as String
  dim uniqueValues() as string
  dim frequencies() as integer
  
  for each value as string in values
    value = value.Trim // Just in case
    dim pos as integer = uniqueValues.IndexOf( value )
    if pos = -1 then
      uniqueValues.Append value
      frequencies.Append 0
      pos = uniqueValues.Ubound
    end if
    frequencies( pos ) = frequencies( pos ) + 1
  next
  
  if uniqueValues.Ubound = -1 then
    //
    // Raise an exception or something
    //
  end if
  
  frequencies.SortWith uniqueValues
  return uniqueValues( uniqueValues.Ubound )
End Function

[quote=285161:@Michel Bujardet]Nope. I did mean 26 minutes past noon, early Saturday afternoon. As right now is 2:28 PM, 14:28. And from what I can see 22:28 in Camberra.

http://www.timeanddate.com/worldclock/[/quote]

Ah yes, now I see. You are into the future from my point. I’m in the past by 8hrs. :slight_smile:

[quote=285166:@Kem Tekinay]Thanks for the clarification. I think I’d still count the values with a function like this:

[code]
Public Function MostFrequentValue(values() As String) as String
dim uniqueValues() as string
dim frequencies() as integer

for each value as string in values
value = value.Trim // Just in case
dim pos as integer = uniqueValues.IndexOf( value )
if pos = -1 then
uniqueValues.Append value
frequencies.Append 0
pos = uniqueValues.Ubound
end if
frequencies( pos ) = frequencies( pos ) + 1
next

if uniqueValues.Ubound = -1 then
//
// Raise an exception or something
//
end if

frequencies.SortWith uniqueValues
return uniqueValues( uniqueValues.Ubound )
End Function
[/code][/quote]

Thanks Kem for that. I’m not sure if the frequency part matters. It’s hard to evaluate what you have posted because I’m still new to Xojo, but nevertheles I should be able to see what’s going on with your solution.

Mostly I break down things in simple parts then put them together. There are cases where I combine things together, but that makes it harder to see what’s going on if I then come back to review at a later date.

Let me optimize the code a bit and explain:

  dim uniqueValues() as string
  dim frequencies() as integer
  
  for each value as string in values
    value = value.Trim
    dim pos as integer = uniqueValues.IndexOf( value )
    if pos = -1 then
      uniqueValues.Append value
      frequencies.Append 1
    else
      frequencies( pos ) = frequencies( pos ) + 1
    end if
  next
  
  if uniqueValues.Ubound = -1 then
    //
    // Raise an exception or something
    //
  end if
  
  frequencies.SortWith uniqueValues
  return uniqueValues( uniqueValues.Ubound )

You feed this function a string array. It loops through each element of the array, trims is (removes leading and trailing whitespace), then looks for that value in the uniqueValues array. If it finds it, it increments the count in the frequencies array, If it doesn’t, it appends that value to uniqueValues and appends 1 to the frequencies.

At the end of the loop, the uniqueValues array will contain each unique value of the original array, and the frequencies array will tell you how many times those values occurred. Give an array like [“26”, “25”, “25”, “24”], uniqueValues will contain [“26”, “25”, “24”] and frequencies will have [1, 2, 1].

The Array.SortWith function will sort an array and the other given array along with it, so we sort the frequencies and rearrange uniqueValues with it to keep the two in sync. After the sort, uniqueValues will be [“26”, “24”, “25”] and frequencies [1, 1, 2]. At that point, the last item of uniqueValues will be the value that occurred most frequently since it was sorted to the bottom, so we return that.

HTH.

Ah, the French and their love for Camembert…

I like Brie and Camembert, combined with some nice fruit and wine it’s a pleasure that almost cannot be surpassed. :slight_smile:

Apart from that, then I’ll try to come up with some other solution to my program :).

Typically, when dealing with noisy data (such as load cells) we use a moving average. I wouldn’t depend on the most frequent value being the average value. Often it is not. A moving average is easy to do using an array as a circular buffer. Increment the array index after every sample is added to the array, and when it goes above the upper bound of the array, zero it.
Hence:

//Insert new value into buffer
buffer(i)=newInputValue
i=(i+1)mod ubound(buffer)
//Now calculate moving average
sum=0
for j=0 to ubound(buffer)
  sum=sum+buffer(j)
next
mAverage=sum/(ubound(buffer)+1)

Thanks guys for all your input. I think I described the issue incorrectly. The introduction of the fruit scenario didn’t help matters either :slight_smile:

I believe I’ve come up with a solution similar to what Jean-Yves Pochez described in post #2. It would be good to see your comments for improvements and there may be something I have overlooked.

Overall, this is the purpose of the software:

The software will record data from the usb/serial port, make some basic calculations and create a simple x/y, time/force graph.
The data is delivered via a load cell and amplifier using an Arduino Uno development board. Apart from occasional hardware calibration of the amplifier, there will need to be software calibration and a tare function, hence the reason for this post.

The Data is being delivered at 100 samples per sec. In the calibration phase it will read approx. 50 samples therefore the numbers can be updated on screen every .5 secs. Sometimes the data won’t be exactly 50 complete samples because of the EndOfLine, therefore taking 40 samples in the middle is more accurate.

This is what I’ve come up with:

It all happens under the Timer Action Event (set to .5 seconds)

[code]//Read from the Serial Port
tempBuffer = Serial1.ReadAll(encodings.ASCII)

// split the data
calibrationArray = Split(tempBuffer, EndOfLine)

//sum elements 5 to 44 (40 samples)
dim x, count, calAverage as Integer
for i as integer = 5 to 44
x = val(calibrationArray(i))
count = count + x
next

calAverage = count / 40
DataLabel.text = str(calAverage)[/code]

The number is updated on screen every .5 seconds which is easier to read than fluctuations at 100 samples - the average is also useful as it smooths out the data.

This seems to work fine. I’m only doing this as a hobby project so it doesn’t have to be perfect, but does need to be accurate. I’ve done a bit of programming before, Quickbasic, Visualbasic, FileMaker and some PIC stuff. I have to admit that I’m struggling a bit with Xojo. The IDE and ways of doing things are hard to get my head around, but overall much easier to develop something than trying to make sense of VisualStudio C# etc.

No doubt I’ll be asking many questions on this forum as this project progresses. Also important to point out that I’ve already successfully written this software in Quickbasic many years ago and now want to update it for windows.

Cheers.

A few things.

You should be specific about the EndOfLine. If you ever decide to port this app to Linux, for example, it would stop working since the default EOL character there is Chr(10), not Chr(13) + Chr(10) as on Windows. Use EndOfLine.Windows instead.

You are making assumptions about the array, i.e., it will have elements 5 through 44. I’d check first using calibrationArray.Ubound. (See below for more on this.)

Your ultimate result will be rounded down always. For example, if the data were (4, 4, 6), at the end you’d do integer math on 14 / 3 to get 4. But the real-world result is 4.6667 which would be rounded up to 5. Is rounding down what you want?

I’m not sure what you meant by this:

Do you mean that there might be a trailing or leading empty line? If so, I’d just Trim the data and use it all.

calibrationArray = tempBuffer.Trim.Split( EndOfLine.Windows )
...
for i as integer = 0 to calibrationArray.Ubound
...

Thanks Kem, your suggestions are very useful.

There is a possibility in the future that the app may ported to Mac OS or Linux, so best to allow for that. However, I’m a bit confused with EndOfLine.Windows - this seems to suggest that the code is specific to Windows?

I was wondering about rounding issues. I don’t want to round down so this is what I’m now using (also with calAverage being defined as double instead of int):

calAverage = round(count/40)

This is a much better solution because the average is more accurate.

You’re correct about “there might be a trailing or leading empty line” because it always depends what’s in the serial buffer at the time the sample is taken. I think my solution of averaging elements 5 through 44 works fine, this part of the software is not critical, although having a larger number would be more accurate.

So if I went with the Ubound suggestion, would this part of the code be correct:

[code]calibrationArray = tempBuffer.Trim.Split(EndOfLine.Windows)

dim x, count as Integer
dim calAverage as Double

for i as integer = 0 to calibrationArray.Ubound
x = val(calibrationArray(i))
count = count + x
next

calAverage = round(count / (calibrationArray.Ubound+1))[/code]

I’m not 100% about that last line of code. (calibrationArray.Ubound+1) should give me the correct number of elements to divide by? I guess I should just test it.

The EndOfLine constant is just a stand-in for characters that comprise the default EOL on the current platform, but it seems your data is consistently using CR/LF independent on the software host, so EndOfLine.Windows will split your data consistently across all platforms.

If you want to avoid all doubt, do this:

tempBuffer = ReplaceLineEndings( tempBuffer.Trim, EndOfLine )
calibrationArray = tempBuffer.Split( EndOfLine )

This will ensure that you know exactly which EOL character is used by replacing them with the current platform EOL. The Split will then look for that EOL to do its work. The specific EOL character(s) used becomes irrelevant.

As for the rest of the code and your use of a double and Ubound, it looks good. Yes, adding 1 to Ubound is the right move since that will give you the true 1-based count of elements, and that’s what you want for an average.