Determining the most common number in a series

Steve_Kelepouris · September 11, 2018, 2:02pm

Let’s say I have a row of 50 numbers.

I don’ t want the average because there can be random numbers that can throw the average out. I want to determine the most common number in that series of numbers.

For example (10 numbers only):

1001 1000 1001 0802 1000 1000 0975 1000 0989 1000 etc...

Looking at the above series, we can easily see and work out that 1000 is the most common. Also, if we have the same amount of common numbers, then best to choose the highest number.

I think I can work out the algorithm, and the code, but in a long and tedious way.

There could be 1000 numbers (or more) delivered in a continual sequence over a set period of 0.5 seconds - meaning the numbers are all delivered every .5 seconds, and continue indefinitely but the result is updated every 0.5 seconds. Doesn’t have to be super fast, but fast enough to keep up.

Hope this makes sense.

Sam_Rowlands · September 11, 2018, 2:10pm

Use a dictionary with the nunber as the key and a count as the variable.

Something like

For each citem as integer in numbers numberSortDictionary.value( citem ) = numberSortDictionary.lookup( citem, 0 ) + 1 Next

This will get you the counts.

Sascha_S · September 11, 2018, 2:11pm

Math is not mine… but maybe you could write the Numbers into a Dictionary and increment the Value for each Key (Number) each time it appears?

Edit: Damn, @Sam Rowlands was faster

Steve_Kelepouris · September 11, 2018, 2:23pm

Yeah that makes sense to the methodology I would go with. I’ve yet to discover the power of dictionaries - this could be the time to use it.

What about overheads? 0.5 seconds, perhaps 1000+ numbers, shouldn’t be a problem?

Sascha_S · September 11, 2018, 2:30pm

the main reason i would use a Dictionary instead of any other kind of array in Xojo for this, is that i’ve read somewhere long ago that Dictionary are extrem fast. the LR say the same about Dictionarys:

Jeff_Tullin · September 11, 2018, 2:34pm

If these are stored in an in-memory database, you can probably get the most common like this

select  number from  table group by number order by count(1) desc limit 1;

select number from table group by number --one row per number

count(1) --tell you how many there are of this number

desc – highest count at the top

limit 1 - only give me the top row

nb: If there are two with the same count, only one is returned

Steve_Kelepouris · September 11, 2018, 3:01pm

This is silly from my point. I’ve realised I already have the numbers in an array!

Therefore I can access them and apply an algorithm. The right algorithm.

This would be a “dirty” fix and workaround to an issue I have in a part of my code. A minor issue, so this fix would be reasonable and have no effect on the rest of the code.

AlbertoD · September 11, 2018, 3:25pm

Taking Jeff database idea I searched for a way to list all numbers that are equal to MAX. Here is what I found/tested:

WITH y AS ( SELECT number, count(1) AS max_count FROM numbers GROUP BY number ) SELECT number, max_count FROM y WHERE ( SELECT max(max_count) FROM y ) = max_count;

Maybe off-topic but could help someone.

Robert_Weaver · September 11, 2018, 10:03pm

I agree with the others that a dictionary is ideal for this application, but if you’d rather stay with arrays, then the best approach would be to sort the array or make a copy of the array and sort it. Then it’s reasonably easy to iterate through the array and count the values:

Sub MaxCount() dim a() As Integer = Array(999,1000,1001,0802,1001,1000,0975,1000,0989,1001) a.Sort dim value As Integer = a(0) dim count,cmax As Integer = 1 for i as Integer = 1 to UBound(a) if a(i)<>a(i-1) then if count>=cmax then cmax=count value=a(i-1) end if count=0 end if count=count+1 next if count>=cmax then cmax=count value=a(UBound(a)) end if 'cmax is the count of the most common value 'value is the most common value End Sub

Robert_Weaver · September 12, 2018, 6:40am

Now, I regret posting that array method. The first time I read the original post I hadn’t noticed that there could be around 1000 items in the list. I’ve used variations of that array sort routine to deal with small arrays of less than 100 items, but I wouldn’t use it for a large list of values. So, definitely go with the dictionary solution. The method that @Sam Rowlands posted will count the items, but doesn’t return the item with the maximum count at the end. However, his routine can be easily extended to keep track, so that there isn’t any post processing required to locate the maximum count item.

Sub MaxCountNouveau() dim a() As Integer = Array(999,1000,1001,0802,1001,1000,0975,1000,0989,1001) dim d As new Dictionary dim cmax As Integer = 0 dim value As Integer = a(0) dim count As Integer = 0 for i as integer = 0 to UBound(a) count=d.lookup(a(i),0)+1 d.value( a(i) ) = count if count>=cmax Then cmax=count value=a(i) end if next End Sub
Plus, this is considerably shorter than the array method.

Steve_Kelepouris · September 12, 2018, 12:21pm

Thanks for that Robert.

Well, as I admitted to above, this was going to be a “dirty fix” to a problem. I didn’t think it was my code but a random issue with a connected device (electronic scales).

After a good sleep, a bit of thought and carefully looking at my code, I realised it was a code issue of my own making.

I’m reading data coming in from the serial port connected to electronic scales. The example was placing a 1000gm weight on the scales which I get an average of over 0.5 seconds for calibration and general readability.

The problem was that I erroneously thought the “Trim” function stripped a string left/right to the delimiter, EndOfLine.Windows in my case.
eg.

[code] //read all data from serial port buffer
tempBuffer = Serial1.ReadAll(encodings.ASCII)

//split the data
calibrationArray = tempBuffer.Trim.Split(EndOfLine.Windows)

Dim x, total as Integer
For i as integer = 0 to (calibrationArray.Ubound)
x = val(calibrationArray(i))
total = total + x
Next[/code]

I knew I should be getting from the device (+/- 3gm) something like this:

Array(999,1000,1001,1003,1001,1000,998,997,1000,1001)

But in reality I was getting something like this:

Array(99,1000,1001,1003,1001,1000,998,997,1000,1)

Where ReadAll was reading what was in the serial buffer (as it should) but sometimes not the whole word. I was using the total and dividing by the number of elements in the array to determine the average, but the total was not always correct, hence the occasional glitch.

I changed the code to:

For i as integer = 1 to (calibrationArray.Ubound-1)

And therefore eliminate the first and last readings then divide by 2 less than the total number of elements. No more glitches.

It still does beg the question which method is best to get the average. Total / Number of Samples OR the Most Common Number in the set.