JSONItem.Load error with certain characters

Today I encountered this JSONItem.Load error using a client’s sample data:

“lexical error: invalid bytes in UTF8 string”

After investigating, the culprit is a “CA” hex sequence in the UTF8 string. I’m not sure what this character is supposed to represent. Is anyone familiar with this?

The thing is that calling JSONItem.Value to encode the string with this bad character works without complaint, but attempting to load the resultant JSONItem.ToString raises the exception.

I can use ReplaceAllB(DecodeHex(“CA”), “”) to strip it but should I? Should I convert it to something else? Now, I’m concerned about what other rogue characters are out there waiting to wreak havoc on my JSON?

I am finding the new JSONItem to be quite finicky and I’m running into new traps all the time,

What’s the encoding on the string when you pass it into JSONItem.Value?

Well, the data passed by the client is nil encoding, and that was never a problem in the past. But the new JSONItem now complains about “not a specific encoding”, so I have to set an encoding. In this case I defined the encoding as UTF8, because that’s what JSONItem wants.and I don’t know the original encoding. Is there a better way?

I mispoke, it’s not what JSONItem wants but rather what I want the encoding to be.

Well, it’s also what JSONItem wants, so technically if it did have an encoding, you’d want to convert the encoding to UTF-8 as well.

At the very least, you should be checking to see if the data is valid before setting the encoding:

http://docs.xojo.com/TextEncoding.IsValidData

Forum for Xojo Programming Language and IDE. Copyright © 2021 Xojo, Inc.