Regex for removing html-tags

This is not for Xojo, but for the use within an editor.

I’m hoping that @Kem Tekinay can help me with a regex.

I have a lot of HTML-files that I want to clean from unwanted tags.
There are many special DIV tags (< div class=“xyz” >) in the HTML file.
Example:

<div class="xyz"><a href="http://example.com/img/img.jpg/" target="_blank"><img border="0" src="../../../img/another_img.jpg"/></a></div>

All the classes I want to remove have the same class name “xyz”, but the text between

and
tags varies every time.
Is there a regex that will select the
and
tags and all the text between the tags?

I always read here that regex is not well suited for html.
better use the xmldocument class to clean it
let’s wait for the Guru Kem !

The simplest, but not fully effective is:

BTW a simple Google search will provide dozens of solutions to this exact problem.

[quote=288070:@Paul Sondervan]This is not for Xojo, but for the use within an editor.

I’m hoping that @Kem Tekinay can help me with a regex.

I have a lot of HTML-files that I want to clean from unwanted tags.
There are many special DIV tags (< div class=“xyz” >) in the HTML file.
Example:

<div class="xyz"><a href="http://example.com/img/img.jpg/" target="_blank"><img border="0" src="../../../img/another_img.jpg"/></a></div>

All the classes I want to remove have the same class name “xyz”, but the text between

and
tags varies every time.
Is there a regex that will select the
and
tags and all the text between the tags?[/quote]
Just to clarify… if the text between the HTML tags you want removes Is also HTML, you want that part kept?

So this:

<div class="xyz"><a href="link">link text</a></div>

Would become

<a href="link">link text</a>

?

I want to remove the complete div class including the tags.
So this:

<div class="xyz"><a href="link">link text</a></div>

Would become:


:wink:

This is not possible with regex. Use TidyMBS instead.

Found a similar regex that, after modification, works fine.

<div class="xyz">(.*?)</div>

This selects the entire text including the tags
I can replace all the xyz classes with nothing and after that the unwanted tags are gone.
:slight_smile:

Glad I could help. :slight_smile:

[quote=288108:@Paul Sondervan]Found a similar regex that, after modification, works fine.

<div class="xyz">(.*?)</div>

This selects the entire text including the tags
I can replace all the xyz classes with nothing and after that the unwanted tags are gone.
:)[/quote]
Keep in mind that this will only work if the class is in that exact position in the div tag.

All the files are built the same way.
The regex works very good for my purposes.
:wink: