RegEx Replace in 2014r1

Since updating to 2014r1 my RegEx Replace function isn’t replacing for the entire data. Only seems to go about half way and the rest of the data is untouched. If I go back and run my project in 2013r4.1 it works fine again. Maybe I’m missing something that I need to do that changed?

I’m doing a standard HTML replacement in the PageReceived event for an HTTPSocket:

Dim r as RegEx
    r = new RegEx
    r.Options.Greedy = False
    r.SearchPattern = "<[^>]*>" OR r.SearchPattern = "<[^<>]+>" (either works)
    r.ReplacementPattern = ""
    r.Options.ReplaceAllMatches = True
    incoming = r.Replace(content)

So you are removing all html code, essentially?

Yes, you seem to have hit a bug and should file Feedback immediately.

Do you have the MBS plugins?

Nope, MBS is too expensive (I make freeware/open source software).

I’ll file a bug and recompile using 2013r4.1 for now. Thanks for the reply :slight_smile: My RegEx knowledge isn’t vast so wasn’t sure.

Please post the Feedback link when ready and I’ll attach my test project to the report.

Bug report: <https://xojo.com/issue/32747>

Attached.

Didn’t realize you could buy only parts of the MBS plugins. Might look into investing in the RegEx plugin. Thanks for the heads up.

Well, if you have util plugin, you could also try RemoveHTMLTagsMBS function.
Or of course RegExMBS class.
If you like one of the functions, you can check with me what license you need.

The project I attached to the report has the working RrgExMBS code. You’ll find that RegExMBS is significantly faster than the native RegEx for repeated matches. (On the order of thousands of times faster. There is a Feedback report about that too.)

Finally, I point out my own RegExRX for developing patterns and exporting the code to Xojo.

Just out of curiosity, is the cutoff point around 8K?

I didn’t test that. Why do you ask?

I ran into an issue like this last week while parsing a TCP stream for JSON. At about 8100 characters (in the source file) things got wonky. I’m trying to give us a precise starting point to find & fix this regression.

I just tested. It only processed about the first 160 characters, but the more significant number here may be 50, as in, it stopped after 50 replacements. Testing that now…

Edit: I first thought it was 49, but it’s 50.

Confirmed, it only performs 50 replacements. I’ll upload a different project to illustrate.

I wonder if this has to do with creating the back references for replacement patterns where it only permits 50

It sounds like too much of a coincidence, and something I included in the Feedback report.

Considering the native PCRE engine only supports something like 10 (numbered 0 - 9) I would hazard a guess its very relevant

15 or 16, actually, but it’s a variable that can be changed. I went through this with Christian when he updated PCRE to the latest (at that time) version and ran into that limit. RegExMBS lets you override that now in code.

But it’s not relevant to Replace since PCRE doesn’t handle replace at all, so I think the wrong variable was changed somewhere before 2014r1 was released.

Pretty sure its the back references count looking at the C code in the plugin that handles replace