JSONItem encoding issue in 2021 releases

Jason_Tait · May 4, 2021, 7:01am

Part of my app extracts data from legacy systems and saves that data using web services.

Since converting my app to 2021 r1.1 (I skipped 2021 r1) I’m getting a large number of JSONExceptions of the same type, eg:

root{“Contact”}[0]{“FirstName”}: String value does not have a specified encoding.

This occurs for any field where I take the .Left or .Trim of the value from the legacy data. So it seems that .Left or .Trim, and probably others, are not returning encoded data.

If I expressly wrap each line with:

DefineEncoding(stringValue, Encodings.UTF8)

then I can solve the problem.

Before I go ahead and do this almost everywhere in my app, does anybody know whether Xojo plans to return encoded text in the string.Left, string.Trim, etc, methods or is it unrealistic to expect that these methods should do this? In other words – is it really down to me to pass encoded text to these methods?

Christian_Schmitz · May 4, 2021, 7:04am

Don’t just call DefineEncoding on a string. You may destroy it, if you apply the wrong one.

Check if string.encoding is nil and look why this happens.
And that usually happens when you get text from outside, e.g. BinaryStream.

And Left, Trim, etc. do handle encoding correctly.

Jason_Tait · May 4, 2021, 7:07am

Thanks Christian. OK I didn’t know that about DefineEncoding. Yes I’m getting text from a BinaryStream. Most text is fine but some text – text where I’ve done a .Trim or a .Left – has the encoding error.

Christian_Schmitz · May 4, 2021, 7:08am

Just debug your way back to the source.
Somewhere you have a string with no defined encoding.
e.g. on a BinaryStream, you may check the content and do a read() call with the encoding parameter to define it right there.

Jason_Tait · May 4, 2021, 7:10am

It’s quite simple code. I do a binaryStream.Read and pass the size and Encodings.UTF8. I’m reading that into a struct (older code) but, as I said, all values are fine to be converted to JSON except where I do a .Trim or a .Left because I need to keep some field sizes to a maximum length.

Jason_Tait · May 4, 2021, 7:21am

Using EncodingNameMBS I see that one string giving the problem has “No Encoding” even before the .Left or .Trim. So I can rule out those – I was wrong about that. It seems to be occurring for the first string that I assign each time.

Christian_Schmitz · May 4, 2021, 7:23am

If you put it in a structure, the encoding is lost when reading it.
Don’t use structures, but use classes to store data.
Structures are only for declares.

Jason_Tait · May 4, 2021, 7:23am

OK got it. Time for a big refactor!

Jason_Tait · May 4, 2021, 7:25am

I’m guessing there’s no easy way to convert a bunch of structures (used to read binary data) to classes?

Jason_Tait · May 4, 2021, 7:35am

The reason I used structures way back when for reading this particular binary data is that I was able to match up the data sizes from the old (VB6 probably) data, eg my structures use Short instead of Boolean and Int32 for some integers (double for others) to precisely match the length of each record.

Jason_King · May 4, 2021, 11:48am

For binary data and file formats structures are still the right way to do it. You’ll need to define the encoding when getting data from the structure. See the string section on the Structure documentation: https://documentation.xojo.com/api/language/structure.html

Tim_Parnell · May 4, 2021, 6:25pm

Xojo screwed something up with setting JSONItem values and nobody wants to admit it

Christian_Schmitz · May 4, 2021, 6:30pm

The JSON classes now complain if you pass in text without encoding.

In MBS Xojo Plugins I made decision long ago, that a string without encoding is checked whether it’s valid UTF-8 and other way treated as a 8bit encoded string. This way we can avoid a lot of trouble.

Tim_Parnell · May 4, 2021, 6:43pm

We’re forced to use the old toolbar style so that users can “open projects in the new version and not make any changes” yet anyone using JSONItem has to make thousands of unnecessary boilerplate changes everywhere in their code?

This is not by design, this is by mistake, and I’ll repeat: nobody wants to admit it.

Jason_Tait · May 4, 2021, 9:38pm

Thanks all. Some of the JSONItem changes appear to have been poorly thought through but — worse in my view — looking through the forum posts on this, it appears as though nobody is listening to feedback. So even though I’m doing a binaryStream.Read and passing in an encoding, I’ll now wrap every string in the structure with another DefineEncoding call when I use it. Which seems ridiculous.

Christian_Schmitz · May 5, 2021, 6:48am

Please only apply DefineEncoding if string.encoding is nil, otherwise you may destroy it, e.g. mark UTF-16 string as UTF-8.

Jason_Tait · May 5, 2021, 7:12am

According to the docs that Jason King linked to, above, there’s never any encoding with a string in a structure. I found with Xojo 2021 r 1/1.1 that I’m getting JSONExceptions for encoding even just when trying to POST data received directly into JSON from web services, ie even when a structure isn’t involved. So I’ve rolled back (again) to Xojo 2020 r 2.1 and I’ll just have to sit on this version for now. I’ve got three projects – Desktop, iOS and Web – and each one now is running on a separate Xojo version!

DerkJ · May 5, 2021, 8:22am

You can use something like this to convert easily to UTF-8:

Public Function UTF8(Extends InputString As String) As String
  If InputString.Encoding = Nil Then
    // Encoding is unknown to the application but known by the developer
    Return InputString.DefineEncoding(Encodings.UTF8)
  End If
  
  If InputString.Encoding <> Encodings.UTF8 Then
    // Encoding is not UTF8, so convert to UTF-8
    Return InputString.ConvertEncoding(Encodings.UTF8)
  End If
  
  // Encoding is already UTF-8
  Return InputString
End Function

Kem_Tekinay · May 5, 2021, 11:05am

Another idea is to subclass JSONItem and override its methods to define the encoding on incoming strings with nil encoding.

Chris_Halford · May 10, 2021, 9:11pm

This all kind of sucks.
Kem, I implemented something like your suggestion and it works fine. Thanks.