RegEx help

DaveS · November 12, 2018, 7:58am

again …

I need to be able to find a sequence of digits from 1 to N in length that are bounded on either side by ANYTHING except A-Z,a-z
and may or may not begin or end a line

Jean-Yves_Pochez · November 12, 2018, 8:30am

some real text example ?

Greg_O_Lone · November 12, 2018, 1:02pm

Off the top of my head…

[^A-Za-z][0-9]+[^A-Za-z]

AlbertoD · November 12, 2018, 1:50pm

Dave you want something like?:

code // match 109
a109b // no match[/code]

How about this:

a109% #109b

Kem_Tekinay · November 12, 2018, 1:58pm

Greg has the right idea but that will match the character before and after the digits too, and won’t match beginning or end of document, so I’d use lookarounds.

(?i)(?<![a-z\\d])\\d+(?![a-z])

The lookbehind makes sure the previous character is not a letter or a digit, and the lookahead makes sure it’s not a letter.

Why a digit in the lookbehind? To make sure it doesn’t start matching in the middle of a stream of digits. In the string “a1234”, “1” will not match because it is preceded by “a”, but, without the \d in the lookbehind, “234” would match because it is preceded by “1”.

AlbertoD · November 12, 2018, 2:20pm

Kem, I used a regex checker online and get:

109a // match 10

I don’t know if Dave will have something like that and if he expects to get no-match, 109 or 10 as a match.

So I guess, depending what Dave wants, maybe add \d to the negative lookahead.

DaveS · November 12, 2018, 2:25pm

[quote=413876:@Alberto De Poo]Dave you want something like?:

code // match 109 -
a109b // no match[/code]

How about this:

a109% #109b[/quote]
first example… yes but excluding the () other examples no, because a LETTER is a prefix or postfix

AlbertoD · November 12, 2018, 2:27pm

Then:

(?i)(?<![a-z\\d])\\d+(?![a-z\\d])

I guess

DaveS · November 12, 2018, 4:29pm

almost works … yeah I guess I should have said “BOTH sides” not “EITHER”

123 a456 789a

the 123 are highlighted (which is right)
the 456 are not (which is right)
but 78 are highlighted, and 9 is not (I wanted none to be highlighted)

on a side but related note.
I am modifing some code that Jim McKay wrote a few years back…
and all the existing highlighting RegEx uses SUBEXPRESSION(1)

Try
   If group.Words.Ubound=-1 Then Continue
   r= New RegEx
   r.Options.TreatTargetAsOneLine=True
   r.Options.CaseSensitive=False
   r.Options.MatchEmpty=True
   r.SearchPattern="(?<!\\B)("+Join(group.Words,"|")+")\\b"
						
   m=r.Search(theText,startChar)
   While m<>Nil
	Dim characterPosition As Integer = st.Text.LeftB(m.SubExpressionStartB(1)).Len
	st.TextColor(characterPosition,m.SubExpressionString(1).Len)=group.HighlightColor
	m=r.Search
   Wend
Catch
   System.DebugLog("exception occurred. match string:"+r.SearchPattern)
End Try

but for this I needed to use “0”?

r= New RegEx
r.Options.TreatTargetAsOneLine=True
r.Options.CaseSensitive=False
r.Options.MatchEmpty=True
r.SearchPattern="(?i)(?<![a-z\\d])\\d+(?![a-z])"
m=r.Search(theText,startChar)
		
While m<>Nil
	Dim characterPosition As Integer = st.Text.LeftB(m.SubExpressionStartB(0)).Len
	st.TextColor(characterPosition,m.SubExpressionString(0).Len)=color.orange
	m=r.Search
Wend

Also how “expensive” is “r=New RegEx” and all the property assignments?
Would it be faster to create an array of “RegEx” object ahead of time and just use them.
Its a finite amount (a dozen or less)… but it seems that creating them literally hundreds or thousands of times…

Kem_Tekinay · November 12, 2018, 4:32pm

Use Albert’s modified pattern above.

DaveS · November 12, 2018, 4:38pm

works much better

what about my other question?