Fastest way to find a word?

I put about 200 keywords in an array and have a method which takes a word as parameters and searches for a match in the array, then returns true if found.

I wonder whether this is the fastest way to do this?

Background: I’m coloring those keywords in a text.

Use a dictionary
put your words as myDictionary.value(currentWord)=true

your find function will be myDictionary.lookup(wordToFind,false)

RegEx is how most syntax highlighting systems work. Call upon the great Kem for help.
Using a dictionary would be insanely slow, as the system would have to iterate through every entry for each word you’re checking.

Thousands of times in a fast loop perhaps.
200 words should be fast enough.
But I suspect Oliver is looking for ‘least code’ rather than ‘fewer cycles’
Searching a dictionary is one line of code as Antonio says

Is the problem not better expressed in the other direction?
iterate through the known 200 words, and colour all instances of that word in an unspecified length of text.
The colouring is likely to be the time bandit.

Work through the body of text: could be a loop of 4000 iterations.
Loop through the 200 words: a loop of 200 iterations. Inside that, colour all words that match in the body text at the same time?.

Maybe more background:

The color highlighting system already exists: I’m using Bob’s FTC and there in particular the FTRBScriptField.colorCode function.

They are doing the coloring per line in a while loop and go through a series of if and elseif, like this

[code] ’ Color the line.
while not (match is nil)

' Get the parsed expression.
text = match.SubExpressionString(0)

' Is this a keyword?
if keywordLookup.HasKey(text) then
  
  ' Post any plain text.
  emptyStack(stack, p)
  
  ' Change to the keyword color.
  p.addObjectFast(createStyleRun(text, Keywords))
  
  ' Is this a quoted string?
elseif Left(text, 1) = """" then
  
  ' Post any plain text.
  emptyStack(stack, p)
  
  ' Use the strings color.
  p.addObjectFast(createStyleRun(text, Strings))[/code]

I want to add coloring of my own XojoScript commands in a different color. Therefor I have a list of these commands in a module and here I just add a function call which returns true if the word was found:

[code] /// ================================
’ Is this a user defined keyword?
elseif imScripting.isUDF_KEYWORD(text) then

  ' Post any plain text.
  emptyStack(stack, p)
  
  ' Use color for User defined keywords
  p.addObjectFast(createStyleRun(text, &cFF66FF00))
  /// ================================[/code]

So I just was curious, whether to use an array for my words or something else, which would be the fastest way to return a true or a false.

So from the discussion so far, I am not sure whether using an array or a dictionary would be faster.

When you call Lookup(), HasKey() or Value() a dictionary doesn’t iterate over it’s items, it hashes directly to it’s ‘bin’ and returns that.

Once while building a trie for an autocomplete engine I measured that array.IndexOf is faster than a dictionary when there’s < 30 entries of a single character, so I used an array in that case, but by 200 long words I’d greatly assume Lookup is much faster than IndexOf. Shouldn’t be hard to test.

Thanks everybody for your explanations! For my case, I am using Antonio’s suggestion now.

If all you want is to know if a word is there, apart from regex, you could join the array and use Instr().

That was my very first way to do this. I thought to have read that string operations are slow and I was wondering if there could be quicker lookup mechanism. I never measured it but I thought there must be something more efficient than Instr…

There was a thread not long ago about a similar question, and the use of join+instr lead to fairly fast results. Regex may be even faster, but the initial Join is key to a fast result. Looping is much slower.

Just to underscore this, arrays.IndexOf iterates over every value to find a match, so finding something that appears at the end of the array is slower than finding something near the top.

A Dictionary calculates a hash on every entry so search time is consistent across all entries. For a small number of entries, it is faster to loop through every entry in an array than to let the Dictionary calculate the hash. For a larger number of entries, the Dictionary will be faster. In any event, a Dictionary does not loop through every entry.

[quote=159577:@Kem Tekinay]…
A Dictionary calculates a hash on every entry so search time is consistent across all entries. …[/quote]
Is the Hash calculated once, when the elements of the dictionary are added, or with each lookup?

Both when the value is added so it can store it in the right “spot”, and for each lookup so it can find what was stored.

You have to calculate the hash for the word you want to look up :wink:

The dict is only calculated once.

Stand grad’ etwas aufm’ Schlauch… Danke!

Google Translate gives me “Stand grad ‘something encouraging’ hose … Thank you!” Hmmm…

I’m the bottleneck, can’t see the forest for the trees - something like that … Markus will get it.
:slight_smile:

[quote=159511:@Oliver Osswald]So I just was curious, whether to use an array for my words or something else, which would be the fastest way to return a true or a false.

So from the discussion so far, I am not sure whether using an array or a dictionary would be faster.[/quote]

I do something almost exactly like this (color syntaxing my own Xojoscript additions) and I use a dictionary as it’s much faster for this than an array. I add all my words to the dictionary during initialization of the app (since those words won’t change) so any dictionary overhead isn’t an issue.

Depends on if you want exact matches only or incremental suggestions too.
For exact matches the dictionary is pretty darned good.
But for partial matches it’s not great.
For exact incremental matches a Trie is a really great data structure to consider as once you have enough letters to be unique you can definitely suggest the remainder.
However its not very good at fuzzy matches.
So if you want things that “sound like” or are close to by maybe have some letters incorrect then its not great.