Need help with a regex

Patrick_Besong · May 28, 2015, 2:49am

I want to make a textarea that can expand abbreviations that the user enters into a list, much like Word does. So whenever they type PA and either a space or some kind of punctuation after it, it will automatically expand the abbreviation. The user will enter words into a listbox, the abbreviation, then what it should expand to. I have it working when someone types a space after the abbreviation, but what if they type a period or other punctuation? I thought maybe using \W would work, but I’m not getting something right. Or is there a better way entirely?

This is in the TextArea’s TextChange event…

[code]Dim rg as New RegEx
Dim rg2 as New RegEx
Dim myMatch as RegExMatch
Dim i as integer

for i = 0 to listbox1.listCount-1

rg.SearchPattern= listBox1.cell(i,1)+"[\\W]"  // <------this is the part i need help with

myMatch=rg.search(TextArea1.text)

if myMatch <> Nil then
  TextArea1.text = textArea1.text.ReplaceAll(rg.searchPattern, listBox1.cell(i,2))
  TextArea1.SelStart=len(TextArea1.Text)
End if

jim_mckay · May 28, 2015, 4:30am

My initial thought… and not exactly a fully formed one…

Use a dictionary and a buffer for the keystrokes.

Add a property to hold the current keybuffer, we’ll call it keyBuffer as string
Add a dictionary and populate it with the key values, eg abbreviationDict.value("AB")="abbreviation"

then in the keydown, if an abbreviation was typed just replace the abbreviation with the new string

[code]keybuffer=keybuffer+key

if key.uppercase<>key.lowercase then //non-alphabetic characters should be unaffected by case

if abbreviationdict.hasKey(keyBuffer) then //an abbreviation exists
me.selstart=me.selstart-len(keybuffer)
me.sellength=len(keybuffer)
me.seltext=abbreviationDict.value(keybuffer)
me.selstart=me.selstart+len(abbreviationDict.value(keybuffer))
me.sellength=0
end if
keybuffer=""

end if[/code]

This way, the text is replaced as typing occurs and will allow pasting of text that might contain an abbreviation without replacing it, and allow the user to enter an abbreviation without replacement by something like ABB-delete, to have AB in the text if they actually want it there.

Kem_Tekinay · May 28, 2015, 7:31am

The token you are looking for is \b, so it would be \bTEXT\b. But if the user enters something that is a regular expression token, your scheme won’t work. Turn each of the user’s character into \x tokens first to be on the safe side.

Patrick_Besong · May 28, 2015, 2:18pm

Jim, I don’t think I can replace the abbreviations as they’re typed as they may be part of a larger word, so I was relying on a space or punctuation to trigger the replacement.

Kem, not sure I follow you as far as turning characters to \x tokens.

Norman_P · May 28, 2015, 3:44pm

an uncompressed trie works really well for this

ok so you ask “Whats a trie”

basically a map that has one letter per level and at the end the full word

so you can easily match partial words AND you only need to look at as much as is required to make the thing unique

for example lets look at a tri for two words - absolute and above

in the first level (which can be a dictionary) is

a -> another dictionary with words that start with A (see level 2 below)
b->nil (we have no words that started with B)
c->nil (we have no words that started with C)
etc for the rest of the alphabet

level 2 would be “a dictionary with words that started with whatever letter is in the level 1 dictionary”
so in the case of “absolute and above” we would have

a->nil (no words in our case that are AA)
b-> another dictionary that is words that started with AB (call this level 3)
c->nil (no words in our case that are AC)

level 3 would be “a dictionary with words that started with whatever 2 letters are in level 1 + level 2”
in our case for the example this is words that started with AB

a->nil (no words in our case that are ABA)
b->nil (no words in our case that are ABB)
etc for the other letters
o->“above”
s->“absolute”
etc for the other letters

done

So you can type bits of words and get the list of all possible matches below that or the only match that is possible quite quickly

jim_mckay · May 28, 2015, 3:46pm

[quote=190420:@Patrick Besong]Jim, I don’t think I can replace the abbreviations as they’re typed as they may be part of a larger word, so I was relying on a space or punctuation to trigger the replacement.

[/quote]
Right, I was suggesting to look for a non-alphabetic key to trigger the replacement. If a non-alphabetic key is typed, and the previous text is not an abbreviation then just reset the buffer. If an abbreviation is typed as part of a larger word, it would be ignored and typing is unaffected.
You could also do more specific filtering to allow numbers as part of the abbreviation trigger…

Looks like my late-night coding had some errors… <> is not case sensitive, dict is not case sensitive, and my logic was wrong. Sorry about that. Here’s a fixed version (haven’t fixed the dictionary lookup case, though you could encodebase64 the keys for that)

[code] if StrComp(key.uppercase,key.lowercase,0)<>0 then //non-alphabetic characters should be unaffected by case

keybuffer=keybuffer+key

else

if abbreviationdict.hasKey(keyBuffer) then  //an abbreviation exists
  me.selstart=me.selstart-len(keybuffer)
  me.sellength=len(keybuffer)
  me.seltext=abbreviationDict.value(keybuffer)
  me.selstart=me.selstart+len(abbreviationDict.value(keybuffer))
  me.sellength=0
end if


keybuffer=""

end if[/code]

Kem_Tekinay · May 28, 2015, 4:56pm

You can represent any character with the token

\x{code}

For example, a space (hex code 20) would be

\x{20}

Since a character represented this way will be taken literally, you can safely represent anything the user types.

Patrick_Besong · June 1, 2015, 12:54am

thanks for the help, guys. I have it working now at least well enough to continue developing. capturing the key on keydown and saving it as a property helped a lot. i’ll retool to make a dictionary too.