Search pattern with Regex that finds last instance of a character

I’ve gotten fairly decent with Regex, but this one eludes me. I want my search pattern to be between one character, and then the LAST occurrence of the another character.

In the example below, it will stop at the FIRST occurrence of > once it passes the first occurrence of <. I want the pattern to be between the first occurrence of <, and the LAST occurrence of the >. Any ideas? This would be a big help for now an in the future for me. Thanks!

Dim rg3 as New RegEx
rg3.SearchPattern=<(.*)>
rg3.Options.Greedy = False
rg3.Options.DotMatchAll = True
Dim myMatch3 as RegExMatch = rg3.search(cellname)
if myMatch3 <> nil then
  VALUE = trim(myMatch3.SubExpressionString( 1 ))
else
end

@Kem_Tekinay - where art thou?

1 Like

This is off the top of my head and untested, but try changing the above to:

rg3.Options.Greedy = True

Edit: You may also need to add:

rg3.Options.DotMatchAll = True
```

That was it! Never looked into what Greedy did exactly, so whatever I thought it did wasn’t what you just pointed out. Never would have guessed it had anything to do with that…always figured it would be something with the syntax of the search pattern line.

Thank you very much!

Keep in mind that this doesn’t handle nested brackets. That’s why I was asking for @Kem_Tekinay. He has a great grasp on the concept.

I didn’t test it, but I think by using greedy = true it should. Can you give an example of a string it will fail on?

<p>hello world</p>

Your pattern would select the entire line with greedy enabled.

The OP’s code was using a subexpression, and extracting just subexpression 1, so (without testing) I would expect it to return:

p>hello world</p

But maybe I misunderstand what greedy = true will do…

It depends a lot on what the source string is

Greed false, basically means “stop as soon as something satisfies the search”, Greed true means, “keep going, stop when the search is completely satisfied”. So having the string "000 < aaa > bbb> ccc> 999" and searching a not greed "<.*>" you’ll get “000 < aaa > bbb> ccc> 999”, but greed true you’ll get “000 < aaa > bbb> ccc> 999”

How so?

I still think it satisfies the OP’s original question, and yes, in that example it would match the entire string but because of the substring request, leave off the first and last characters. I still can’t think of a string where it would NOT satisfy the OP’s original question. Can you give an example where it would fail? I can’t; which is what I meant by maybe I don’t understand what greedy = true would do in that context. I don’t see how it would fail.

How about:

^[^<]<(.)>[^>]*$

…with Greedy=true, DotMatchAll=true, and then use SubExpression(1).

Sorry, I was away for the weekend.

When Greedy, the engine will grab as much of the source as it can that satisfies the pattern, even if that contains smaller matches within it.

For example, with this Source and Pattern:

Source: xyz<a<b>c>123
Pattern: <.*>.*

We will get <a<b>c>123.

When Greedy is False, we will get <a<b>.

As an aside, the pattern .* has its uses but should be avoided where possible for something more accurate.

Wouldn’t that require there to be text ahead of the first < and after the final > character? The OP did not state that as a requirement. As I interpret the OP, even Greg’s sample which matches the entire input string (other than using a subexpression) still meets the OP’s request.

No, the * repeat character stands for “zero or more of the preceding characters”.

Looks like the editor is somehow eating some of the characters in my pattern - I guess I should have enclosed them in some formatting:

^[^<]*<(.+)>[^>]*$