Right, thanks for that. I would have expected that String would be in good shape - perhaps my recollection of early Text vs. String discussion is flawed.
I’m bugging my user about the logfile. He’s usually quite good about that and I’ve implemented a number of changes he’s suggested, so I think he owes me one.
This has been a frustrating thread to some extent. In the end I couldn’t be bothered to explain why the code I characterised as guard code is just that. Of more concern at the minute is my Apple dveloper certs, which are set to expire in a day or two with no replacement. I’ll see what Apple has to say.
If you think it’s likely a unicode issue, you might want to check out unusual Emoji - for example, many languages can’t decide how many “characters” are in an Emoji.
Yeah, I’ve tried with a couple of 4-byte emoji. I even beefed up my Unit Test for this method so I can insert arbitrary bytes in the text such as the Unicode replacement char, or inserting a BOM at the beginning or even in the middle. Still no crash so far.
Your reference looks useful. I might try for overlong chars (isn’t C2 00 and overlong Null? have to check) or composed/decomposed ones.
Grab some browsing some websites and doing a “Save As” HTML to your samples folder to test. Sites as Amazon, Microsoft, Apple, etc.
Once you (with luck) trigger the bug, you’ll have means to understand it and fix it.
But if your user content triggering the exception is some unusual bad formed content, you may not have the means to detect the problem using “good contents”.
I’d throw an exception handler around the entire thing that dumps the input as hex bytes for future troubleshooting. This likely won’t be the first time it runs into data it can’t tolerate.
Yes, the user already had a version of the app with exactly that, but he decided to remove the bad data instead, drat it. I could just decide to leave that permanently there for the time being.