Regex + html problem

I’m using some regex to fish out external CSS for html. In Patterns the regex works fine:

In Xojo with the following code:

dim html as String = backen

dim theRegex as new RegEx
theRegex.options.greedy = False
theRegex.Options.TreatTargetAsOneLine = True
theRegex.Options.DotMatchAll = True

'remove external css
theRegex.searchPattern = "(?i)<link[^>]+href\s*=\s*['""](.*?css.*?)['""][^>]*>"
theRegex.ReplacementPattern = ""
Html = theRegex.Replace(html)

me.LoadPage(html, nil)

almost all html is removed:

Partial html:

Neuer Beitrag "Saftige Mandelhörnchen mit Marzipan"

What is wrong with the Xojo code?

I don’t think that posting the html worked. I’ve uploaded an example to

May it be you just forgot the $1 in Line 11 ?

theRegex.ReplacementPattern = "$1"

BTW: Why mix RegEx Options in the Xojo Object and the RegEx String? You can replace the (?i) in the String with the following Xojo RegEx Class Option.

theRegex.Options.CaseSensitive = False


theRegex.Options.TreatTargetAsOneLine = True

is not neccessary i think.

And extending the RegEx Search String to


will remove the Font Link also. But this solution is not very elegant/flexible. :slight_smile:

The replacement is okay. The full search result should be removed.

When I add the $1 I see the following in your Window:

I thought that’s what you wanted :wink:

Ah! Got it. Sry, my fault. :slight_smile:

I saw that only the 1st match got replaced/(removed). Did you try this?

theRegex.Options.ReplaceAllMatches = True

I had to leave my Dev.Desk and can’t test by myself anymore. Sry.

Thanks! Ouch for the replace all. Yes, this is missing. The external CSS link must be completely removed or it will defeat the purpose of the regex.

Looks like the pattern *? toggles the Option.Greedy option, rather than setting greedy to false as one would expect.

So, if you remove the two ? from your pattern and add the Option.ReplaceAllMatches = True it works as intended. Or don’t set the Option.Greedy but add anaother ? after the last * in the pattern.