Can anyone refresh my mind on regex to remove html tags from a string generated by another regex match?
Thanks in advance!!
What have you tried that hasn’t worked? RegExRx is a great way to work out your regular expressions before using them in Xojo.
i am cycling thru the same regex search pattern and end up with a match that has html tags around the actual content i need - those tags could be font color and/or font style - i dont need them - so if you have a reference i can use to add regex that removes those HTML tags then simply let me know otherwise… thanks!
papa = mtch0.SubExpressionString( 1 )
papa_final_lines = ReplaceAll(papa ,"/<cite\\ .*?<\\/.*?cite>/i","")
papa_final_lines = ReplaceAll(papa_final_lines ,"/<font\\ .*?<\\/.*?font>/i","")
papa is a string found via regex match:
<font color="blue"><cite>NeedToCleanThisTextString</cite></font>
Hi Tim - I think i forgot something after few years of not touching RB/XOJO
If using the MBS plugins is an option, check out the RemoveHTMLTagsMBS -function at https://www.monkeybreadsoftware.net/string-string-method1.shtml.
found in old RB forums where i used to hang out every day, this snippet - tried it and worked like a charm for me -
Function StripHTMLTags(InputStr As String) As String
Dim R As New RegEx
R.SearchPattern = "<[^<>]+>"
R.ReplacementPattern = ""
Dim S As String = InputStr
Dim S2 As String = R.Replace(InputStr)
While (StrComp(S, S2, 0) <> 0)
S = S2
S2 = R.Replace()
Wend
Return S2
End Function
but thanks thou to Max for keeping up the MBS plugins - i dont remember exactly but i think i even purchased your package in 2008 or 2009 - not sure but thanks for your enthusiasm and extra care for current and potential paid customers - you are the BEST!!
Thanks for your kind words, but they should maybe be directed to the author of the plugin which happens to be @Christian Schmitz , not me.
This is how I do it:
[code]Private Function StripHTML(html As String) as String
Dim re As New RegEx
re.SearchPattern = “(?:<style.+?>.+?|<script.+?>.+?|<(?:!|/?[a-zA-Z]+).*?/?>)”
re.ReplacementPattern = “”
re.Options.ReplaceAllMatches = True
Dim plain As String = re.Replace(html)
Return plain
End Function
[/code]
Thank you so much, Oliver - it works as well
Without using regex, you could transform your html file into a text file using textutil in a shell. Something like this:
dim sh as new shell
sh.Execute "textutil -convert txt " + myFile.shellPath