Get the largest Key in Dictionary

Martin_T · April 24, 2017, 12:45pm

I have a dictionary with keys that are ints. I would like to get the largest key. I don’t keep track of keys so they might be consecutive (e.g. 1,2,3,4,5,6) but might skip (1,3,4,5) although I doubt that makes any difference.

Do I just use a binary search or is there a method? As far as I see you can hardly beat binary search for such a simple task - maybe you can halve it.

C# has a method Dictionary.Keys.Max

Kem_Tekinay · April 24, 2017, 1:09pm

You will have to write your own code, but you won’t need a binary search. Binary searches only work on sorted lists, and if it’s sorted anyway, you would just take the last value.

Function MaxKey (Extends d As Dictionary) As Int64
  dim maxValue as Int64 = -999999999 // Something outside the range of what's expected
  for each key as variant in d.Keys
    if key.Int64Value > maxValue then
      maxValue = key.Int64Value
    end if
  next key
  return maxValue
End Function

(Not tested.)

This assumes the keys can be converted to integer, of course.

shao_sean · April 24, 2017, 1:26pm

Why not take the keys and sort them and then grab the last item of the sorted array?

Eli_Ott · April 24, 2017, 1:35pm

Function MaxKey (Extends dct As Dictionary) As Int64 Dim keys() As Variant = dct.Keys Dim ints() As Int64 For i As Integer = 0 To keys.Ubound ints.Append(keys(i)) Next Return ints(ints.Ubound) End Function
(not tested)

Kem_Tekinay · April 24, 2017, 1:49pm

Depending on the size of the array, sorting sometimes takes longer than just scanning through it.

Kem_Tekinay · April 24, 2017, 1:52pm

Eli, your code could be reduced to return dct.Keys( dct.Keys.Ubound ).Int64Value, but there is no guarantee that the keys will be given in sorted order so it would just be returning an arbitrary key. Did you mean to sort the array first?

Jeff_Tullin · April 24, 2017, 1:55pm

So keep track of the keys at the point of insertion…
If you add the items to the dictionary, compare the key you are adding with the ‘highest so far’ , and if its higher, change highestsofar to be equal to the new key right at the point when you add it.

Douglas_Handy · April 24, 2017, 3:36pm

If you keep track of highestsofar, then you also must handle key removals, Then if the key to be removed is the highestsofar, you are back to the original question of finding the highest remaining key. But depending on how often you need to “highest so far”, and how often keys may be removed, this could still be more efficient. Or not.

Tim_Parnell · April 24, 2017, 3:42pm

I have a curiosity.
Considering the two methods:

327492:@Kem Tekinay:

Function MaxKey (Extends d As Dictionary) As Int64
  dim maxValue as Int64 = -999999999 // Something outside the range of what's expected
  for each key as variant in d.Keys
    if key.Int64Value > maxValue then
      maxValue = key.Int64Value
    end if
  next key
  return maxValue
End Function

and

Function MaxKey (Extends d As Dictionary) As Int64
  dim maxValue as Int64
  for each key as variant in d.Keys
    maxValue = max(key.Int64Value, maxValue)
  next key
  return maxValue
End Function

If one was method significantly faster than the other, why?

DaveS · April 24, 2017, 3:45pm

the second method has a slight flaw. it “assumes” the largest value will be >=0 which may not be the case

Douglas_Handy · April 24, 2017, 4:01pm

Have you benchmarked them? I would not expect the first to be “significantly” faster unless the dictionary has lots of keys. And if it has that many keys, maybe an indexed database is a better paradigm. Dictionaries use a hash table to let them locate a specific key quickly without a sequential scan. But that does not help when not looking for a specific key value.

However in terms of why the two methods could differ, excluding compiler optimizations the second performs more operations even though fewer lines of code:

Sets up call stack for max() function with two arguments
The max() function has to decide how many values were passed (it can take more than two) adding some overhead
Assigns return value to maxValue, even if it already had that value

So I’d expect the first to be somewhat faster. But these are the types of things we cared more about in the days when hardware was more expensive than programmer time. Now it is nearly always better to write for code maintainability, and secondarily for ease of coding the first time around. Lower hardware costs, faster hardware speeds, and optimizing compilers all contribute to changing the equation for what is “significantly” faster these days.

DaveS · April 24, 2017, 4:18pm

Bench mark (using array instead of dictionary)… but the test was on the comparison method
method #1 took 33 ticks
method #2 took 130 ticks (#1 was 4x faster)

I would have bet on #2 being the faster of the two

		Dim v(-1) As Integer
		Dim i As Integer
		Dim maxValue As Int64
		Dim time1 As Integer
		Dim time2 As Integer
		Dim start As Integer
		For i=1 To 10000000
				v.append Rnd*100000
		Next i
		//
		maxValue=-99999999999
		start=Ticks
		For i=1 To v.Ubound
				If v(i)>maxValue Then maxValue=v(i)
		Next i
		time1=Ticks-start
		//
		//
		maxValue=-99999999999
		start=Ticks
		For i=1 To v.Ubound
				maxValue=Max(v(i),maxValue)
		Next i
		time2=Ticks-start
		MsgBox "#1="+Str(time1)+EndOfLine+"#2="+Str(time2)

Jeff_Tullin · April 24, 2017, 4:25pm

In the second case, the two numbers are always compared, AND maxvalue is always set to something.
In the first case the two numbers are compared and nothing else may happen?

Douglas_Handy · April 24, 2017, 4:29pm

But being under 100 ticks faster across 10 million iterations is not necessarily significant even if 4x slower. If I thought a dictionary may need 10 million keys, I’d probably use a database instead of a dictionary. I’m actually surprised it only added 1.5 seconds across 10 million iterations.

For a “typical” size dictionary (insert your own definition of “typical”), it drops the speed difference to low enough to not be a deciding factor in my book.

Oh, and just noticed you used an array and not dictionary. This drastically alters what happens under the covers. So these timings don’t really answer the question for how significant the difference would be using dictionaries instead.

Some people assume fewer lines of code mean faster execution. But high level languages like Xojo can make some things very simple to code with few lines of source, but translate into many more operations under the covers. In general that is a very good thing.

Jeff_Tullin · April 24, 2017, 4:31pm

We’re rapidly reaching ‘Angels on a pin’ territory…
Sometimes it’s ‘do anything but actual work’ day…

DaveS · April 24, 2017, 4:34pm

I read the question as to which METHOD was faster (as I stated in my post)
the overhead for using a dictionary would be the same for each method,

Kem_Tekinay · April 24, 2017, 7:04pm

If I’m not mistaken, Max will cast the value to Double and that, no doubt, involves some overhead.

BTW Dave, your loops start at 1 instead of 0.

Eli_Ott · April 24, 2017, 7:08pm

Yes.

Tim_Hare · April 25, 2017, 3:02am

Function calls incur some overhead, so a 4x difference is not surprising.

Eli_Ott · April 25, 2017, 3:38am

Not always, as LLVM might inline a function call (depending on the compiler settings of course).