Hard crash regex

Does anyone have an idea what might cause a hard crash in the following code?

Protected Function ConvertShort(theText as String) As String

'remove tags from html

'remove
dim theRegex as new RegEx
theRegex.options.ReplaceAllMatches = true
theRegex.Options.Greedy = False
theRegex.Options.DotMatchAll = True
theRegex.SearchPattern = “<style.*”
theRegex.ReplacementPattern = “”
theText = theRegex.Replace(theText)

'remove comments
theRegex.SearchPattern = “<!.*>”
theText = theRegex.Replace(theText)

theText = theText.ReplaceAll(“<?xml version=""1.0""?>”, “”)

theText = RemoveHTMLTagsMBS(theText)
theText = DecodingFromHTMLMBS(theText)

Return theText

End Function

System Integrity Protection: enabled

Crashed Thread: 28

Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000000
Exception Codes: 0x0000000000000001, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY

Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process: exc handler [59859]

VM Region Info: 0 is not in any region. Bytes before following region: 4446769152
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
—>
__TEXT 1090c5000-10b848000 [ 39.5M] r-x/r-x SM=COW …il Archiver X

Thread 28 Crashed:
0 RegEx.dylib 0x10d02aa5a 0x10cfd8000 + 338522
1 RegEx.dylib 0x10d029741 0x10cfd8000 + 333633
2 RegEx.dylib 0x10d02a230 0x10cfd8000 + 336432
3 RegEx.dylib 0x10d02ad34 0x10cfd8000 + 339252
4 Mail Archiver X 0x109241563 RegEx.Replace%s%os + 83
5 Mail Archiver X 0x10b01af83 MimeConvertHtmlToText.ConvertShort%s%s + 1411
6 Mail Archiver X 0x109e5723b MimeParser.parseAlternative%%oo + 6539
7 Mail Archiver X 0x109e5449c MimeParser.parseMime%%oo + 604
8 Mail Archiver X 0x109e52626 MimeParser.execute%%o + 2614
9 Mail Archiver X 0x109e51667 MimeParser.Constructor%%osbA1o + 807
10 Mail Archiver X 0x109ef17db MailParser.ParseMime%%o + 1307
11 Mail Archiver X 0x109ecb7ff MailParser.parse%i8%o + 6079
12 Mail Archiver X 0x10a8c5d31 ArchiveThread.Archive%%o + 9041
13 Mail Archiver X 0x10a8be09e ArchiveThread.Event_Run%%o + 110
14 XojoFramework 0x10d65b627 0x10d3e8000 + 2569767
15 XojoFramework 0x10d51d275 0x10d3e8000 + 1266293
16 libsystem_pthread.dylib 0x7ff8019d74f4 _pthread_start + 125
17 libsystem_pthread.dylib 0x7ff8019d300f thread_start + 15

Can you tell us which line it crashes on?

No, not yet. I need to make the user a test version.

I did a little bit of testing with the Xojo code part of it and saw no problem.

theText = RemoveHTMLTagsMBS(theText)
theText = DecodingFromHTMLMBS(theText)

Is this MonkeyBridgeSoftware calls?

No, it’s one of the regexes. Either the style or the comments one. I’ve made the user a test version with more logging so that I should be able to see which regex causes the crash. I should also be able to identify which email causes the crash.

I wonder if it’s a text encoding issue, like theText is defined as UTF-8, but it’s not valid.

Good idea. However, the crash is in the second and not the first one.

Hopefully, I get the affected email today.

Something changes if you put this line on top?

If theText.length < 1 Then Return ""

Also force some symbols literally as

theRegex.SearchPattern = "\<style.*"

theRegex.SearchPattern = "\<\!.*\>"

Be aware that your search patterns can wrongly match and remove valid contents.