I have a dictionary with keys that are ints. I would like to get the largest key. I don’t keep track of keys so they might be consecutive (e.g. 1,2,3,4,5,6) but might skip (1,3,4,5) although I doubt that makes any difference.
Do I just use a binary search or is there a method? As far as I see you can hardly beat binary search for such a simple task - maybe you can halve it.
You will have to write your own code, but you won’t need a binary search. Binary searches only work on sorted lists, and if it’s sorted anyway, you would just take the last value.
Function MaxKey (Extends d As Dictionary) As Int64
dim maxValue as Int64 = -999999999 // Something outside the range of what's expected
for each key as variant in d.Keys
if key.Int64Value > maxValue then
maxValue = key.Int64Value
end if
next key
return maxValue
End Function
(Not tested.)
This assumes the keys can be converted to integer, of course.
Function MaxKey (Extends dct As Dictionary) As Int64
Dim keys() As Variant = dct.Keys
Dim ints() As Int64
For i As Integer = 0 To keys.Ubound
ints.Append(keys(i))
Next
Return ints(ints.Ubound)
End Function
(not tested)
Eli, your code could be reduced to return dct.Keys( dct.Keys.Ubound ).Int64Value, but there is no guarantee that the keys will be given in sorted order so it would just be returning an arbitrary key. Did you mean to sort the array first?
So keep track of the keys at the point of insertion…
If you add the items to the dictionary, compare the key you are adding with the ‘highest so far’ , and if its higher, change highestsofar to be equal to the new key right at the point when you add it.
If you keep track of highestsofar, then you also must handle key removals, Then if the key to be removed is the highestsofar, you are back to the original question of finding the highest remaining key. But depending on how often you need to “highest so far”, and how often keys may be removed, this could still be more efficient. Or not.
Function MaxKey (Extends d As Dictionary) As Int64
dim maxValue as Int64
for each key as variant in d.Keys
maxValue = max(key.Int64Value, maxValue)
next key
return maxValue
End Function
If one was method significantly faster than the other, why?
Have you benchmarked them? I would not expect the first to be “significantly” faster unless the dictionary has lots of keys. And if it has that many keys, maybe an indexed database is a better paradigm. Dictionaries use a hash table to let them locate a specific key quickly without a sequential scan. But that does not help when not looking for a specific key value.
However in terms of why the two methods could differ, excluding compiler optimizations the second performs more operations even though fewer lines of code:
Sets up call stack for max() function with two arguments
The max() function has to decide how many values were passed (it can take more than two) adding some overhead
Assigns return value to maxValue, even if it already had that value
So I’d expect the first to be somewhat faster. But these are the types of things we cared more about in the days when hardware was more expensive than programmer time. Now it is nearly always better to write for code maintainability, and secondarily for ease of coding the first time around. Lower hardware costs, faster hardware speeds, and optimizing compilers all contribute to changing the equation for what is “significantly” faster these days.
Bench mark (using array instead of dictionary)… but the test was on the comparison method
method #1 took 33 ticks
method #2 took 130 ticks (#1 was 4x faster)
I would have bet on #2 being the faster of the two
Dim v(-1) As Integer
Dim i As Integer
Dim maxValue As Int64
Dim time1 As Integer
Dim time2 As Integer
Dim start As Integer
For i=1 To 10000000
v.append Rnd*100000
Next i
//
maxValue=-99999999999
start=Ticks
For i=1 To v.Ubound
If v(i)>maxValue Then maxValue=v(i)
Next i
time1=Ticks-start
//
//
maxValue=-99999999999
start=Ticks
For i=1 To v.Ubound
maxValue=Max(v(i),maxValue)
Next i
time2=Ticks-start
MsgBox "#1="+Str(time1)+EndOfLine+"#2="+Str(time2)
In the second case, the two numbers are always compared, AND maxvalue is always set to something.
In the first case the two numbers are compared and nothing else may happen?
But being under 100 ticks faster across 10 million iterations is not necessarily significant even if 4x slower. If I thought a dictionary may need 10 million keys, I’d probably use a database instead of a dictionary. I’m actually surprised it only added 1.5 seconds across 10 million iterations.
For a “typical” size dictionary (insert your own definition of “typical”), it drops the speed difference to low enough to not be a deciding factor in my book.
Oh, and just noticed you used an array and not dictionary. This drastically alters what happens under the covers. So these timings don’t really answer the question for how significant the difference would be using dictionaries instead.
Some people assume fewer lines of code mean faster execution. But high level languages like Xojo can make some things very simple to code with few lines of source, but translate into many more operations under the covers. In general that is a very good thing.