For years I’ve treated regex the same way I treat ebola I’ve kept my distance.
I now find myself needing to use it for many areas of a project. I’ve pieced together several find and replace functions from things I’ve found on the internet.
I would think what I’m looking for is something asked frequently, but I can’t seem to find anything on searches, I’m possibly not asking Google properly. I’m hoping there is a regex person that can point me in the right direction.
I need to search a string for a given phrase and match it with; exact match of word, is contained in word, is start of word, or is end of word. The matched word needs to be retrieved for each of the searches.
I’ve tried to tinker with \b but haven’t discovered how to do anything but an exact match, and I’m not sure I’m doing that properly. The nuances of regex are an art, and I’m finger painting
Thanks Mike and Jason. I downloaded the trial RegExRx and it looks like a great product for assembling patterns, if you know what you are doing already.
I thought I could figure out what I needed easily enough so I wrote a small app to do in a very meager way what RegexRx does, all, before I came to the forum for help. So, I’ve done a lot of experimenting in both my app and RegExRx, but all I have is a long list of what doesn’t work, and no understanding of why. This is standard text editor, find and replace fodder that is found in so many applications, I’d think the internet would be pouring over with examples…
In your case, let’s say you want to find the any word that contains the word “dog”. This pattern will do the trick:
\\b\\S*dog\\S*
This means: Start at a word break followed by any number of characters that are not a whitespace, then “dog”, then any number of characters that are not a whitespace.
It won’t matter if “dog” is the entire word, at the start, end, or somewhere in the middle.
The problem is, you don’t know what the user will enter so you don’t want to use their input verbatim in case they use a character that means something to the regex engine. I recommend this code:
dim searchWord as string = theUsersInput
dim chars() as string = searchWord.Split( "" )
for i as integer = 0 to chars.Ubound
select case chars( i )
case "a" to "z", "0" to "9"
// do nothing
else
chars( i ) = "\\x{" + hex( chars( i ).Asc ) + "}"
end
next
dim pattern as string = "\\b\\S*" + join( chars, "" ) + "\\S*"
You can also implement a very basic find function like this:
Sub FindNext(source as textarea, searchKey as string, restartFrombeginning as Boolean = False)
dim txt as String = source.Text
static lastPosition as Integer = 0
static lastSearchKey as String
if lastSearchKey <> searchKey or restartFrombeginning then
lastPosition = 0
end if
lastPosition = txt.InStr(lastPosition+1,searchKey)
if lastPosition > 0 then
source.SelStart = lastPosition-1
source.SelLength = searchKey.Len
end if
lastSearchKey = searchKey
End Sub
In a button you can call it like:
FindNext(InputTextArea,FindField.text)
Where InputTextArea is the name of your TextArea, and FindField is the name of the field a user enters a search key into. A replace function would operate in a similar manner.
Thanks Kem, \S is what I was missing, not sure how I overlooked that, but I tried so many different combinations of things and just couldn’t get traction. I’ve spent quite a few hours on that site already along with some others. Still couldn’t find what I was looking for there, but did get a lot of other bits I needed elsewhere.
I’m sure if I knew more about the subject your app would have been very helpful, but plugging in garbage and getting no results is kind of like the monkeys and typewriters conundrum.
And thanks Jason, I did my initial code in pure RB but the file sizes were too large to be efficient. One file might be just fine, but I’m dealing with dozens of files at a minimum and even shaving a couple of seconds off the process is worth the effort.
In case you hadn’t seen it, RegExRX has an insert menu with all the tokens you can use and an explanation of what each does. It’s the arrow just above the pattern field on the left.
I have a function to cleanup a text file by stripping 2 or more spaces, all non printables, and I need to add stripping 3 or more consecutive carriage returns. Single and double \r need to be permitted. \s\s\s works fine for cases where there are an even number of \r’s, but strips all if odd. Again I’ve tried as many combinations as I can think of and Google hasn’t provided an answer either.
I’ve tried \r and \s \s is what I really needs as this function is meant to be used on imported files where I have no control over the content and covers the additional occurrences of line feeds and such.
my replace string is empty, so I need to catch 3 or more \s and replace with an empty string. I’m sure this is a replace pattern that can do this, but I’m only into regex about 48 hours or so now and still trying to solidify the basics in my mind.
I phrased my question wrong and in my limited testing on varied text documents \s is what I need to use I’m sure I find a document that breaks that in the future, so \R is noted.
Scott, \s\s\s+ has the odd even glitch and only retains two carriage returns if the number is even. Odd number of carriage returns strip all carriage returns.
As I tried to reformulate my question it seemed that I was seeking a conditional solution. Using Kems’ RX app I came up with this, which seems to work.
(?(?=\s\s\s)\s|)
Is there a pitfall I’m not aware of? I’ll give \R a whirl to see if I get equal or better results.
Yeah your right! that segment worked in you app alone, but did nothing when added to my search string.
this is my current search pattern without the needed CR stripping. I’ve added stripping characters above 127 and currently evaluating for issues. The function is meant to visually cleanup a text document without destroying content.
" +|[\x00-\x09\x0B\x0C\x0E-\x1F\x80-\xFF]" (first two characters are a space)
If possible I’d like to keep this one function with the replacement string as empty, which I have now. \R\R\R+ works great for conditions where there are even numbered CRs. There is probably no distinguishable difference in doing in two searches, but I’d like to keep it to a single search if possible (I understand I have no valid reason for that requirement, other than simplicity, which now seems irrelevant).
I have been doing RegEx in Perl since the early 90s (I hate to admit that). And I know RegEx but I still use RegExEx to create my regex string these days. Not that I can’t do it, but RegExEx just does it much faster and easier. And I can verify the regex strings before applying them to my code (perl, Xojo, php, etc). It is a gem in my toolbox and worth every penny of it.