Regex + html problem

I’m using some regex to fish out external CSS for html. In Patterns the regex works fine:

In Xojo with the following code:

dim html as String = backen

dim theRegex as new RegEx
theRegex.options.greedy = False
theRegex.Options.TreatTargetAsOneLine = True
theRegex.Options.DotMatchAll = True

'remove external css
theRegex.searchPattern = "(?i)<link[^>]+href\s*=\s*['""](.*?css.*?)['""][^>]*>"
theRegex.ReplacementPattern = ""
Html = theRegex.Replace(html)

me.LoadPage(html, nil)

almost all html is removed:

Partial html:

Neuer Beitrag "Saftige Mandelhörnchen mit Marzipan"

What is wrong with the Xojo code?

I don’t think that posting the html worked. I’ve uploaded an example to https://www.mothsoftware.com/downloads/test.zip

May it be you just forgot the $1 in Line 11 ?

theRegex.ReplacementPattern = "$1"

BTW: Why mix RegEx Options in the Xojo Object and the RegEx String? You can replace the (?i) in the String with the following Xojo RegEx Class Option.

theRegex.Options.CaseSensitive = False

The

theRegex.Options.TreatTargetAsOneLine = True

is not neccessary i think.

And extending the RegEx Search String to

[<link[^>]+href\s*=\s*['""""](.*?css.*?)['""""][^>]*>|https:.*css"">]

will remove the Font Link also. But this solution is not very elegant/flexible. :slight_smile:

The replacement is okay. The full search result should be removed.

When I add the $1 I see the following in your Window:
Screenshot-1

I thought that’s what you wanted :wink:

Ah! Got it. Sry, my fault. :slight_smile:

I saw that only the 1st match got replaced/(removed). Did you try this?

theRegex.Options.ReplaceAllMatches = True

I had to leave my Dev.Desk and can’t test by myself anymore. Sry.

1 Like

Thanks! Ouch for the replace all. Yes, this is missing. The external CSS link must be completely removed or it will defeat the purpose of the regex.

1 Like

Looks like the pattern *? toggles the Option.Greedy option, rather than setting greedy to false as one would expect.

So, if you remove the two ? from your pattern and add the Option.ReplaceAllMatches = True it works as intended. Or don’t set the Option.Greedy but add anaother ? after the last * in the pattern.