JSONItem encoding issue in 2021 releases

If you put it in a structure, the encoding is lost when reading it.
Don’t use structures, but use classes to store data.
Structures are only for declares.

1 Like

OK got it. Time for a big refactor!

I’m guessing there’s no easy way to convert a bunch of structures (used to read binary data) to classes?

The reason I used structures way back when for reading this particular binary data is that I was able to match up the data sizes from the old (VB6 probably) data, eg my structures use Short instead of Boolean and Int32 for some integers (double for others) to precisely match the length of each record.

For binary data and file formats structures are still the right way to do it. You’ll need to define the encoding when getting data from the structure. See the string section on the Structure documentation: https://documentation.xojo.com/api/language/structure.html

Xojo screwed something up with setting JSONItem values and nobody wants to admit it

2 Likes

The JSON classes now complain if you pass in text without encoding.

In MBS Xojo Plugins I made decision long ago, that a string without encoding is checked whether it’s valid UTF-8 and other way treated as a 8bit encoded string. This way we can avoid a lot of trouble.

2 Likes

We’re forced to use the old toolbar style so that users can “open projects in the new version and not make any changes” yet anyone using JSONItem has to make thousands of unnecessary boilerplate changes everywhere in their code?

This is not by design, this is by mistake, and I’ll repeat: nobody wants to admit it.

7 Likes

Thanks all. Some of the JSONItem changes appear to have been poorly thought through but — worse in my view — looking through the forum posts on this, it appears as though nobody is listening to feedback. So even though I’m doing a binaryStream.Read and passing in an encoding, I’ll now wrap every string in the structure with another DefineEncoding call when I use it. Which seems ridiculous.

2 Likes

Please only apply DefineEncoding if string.encoding is nil, otherwise you may destroy it, e.g. mark UTF-16 string as UTF-8.

1 Like

According to the docs that Jason King linked to, above, there’s never any encoding with a string in a structure. I found with Xojo 2021 r 1/1.1 that I’m getting JSONExceptions for encoding even just when trying to POST data received directly into JSON from web services, ie even when a structure isn’t involved. So I’ve rolled back (again) to Xojo 2020 r 2.1 and I’ll just have to sit on this version for now. I’ve got three projects – Desktop, iOS and Web – and each one now is running on a separate Xojo version!

You can use something like this to convert easily to UTF-8:

Public Function UTF8(Extends InputString As String) As String
  If InputString.Encoding = Nil Then
    // Encoding is unknown to the application but known by the developer
    Return InputString.DefineEncoding(Encodings.UTF8)
  End If
  
  If InputString.Encoding <> Encodings.UTF8 Then
    // Encoding is not UTF8, so convert to UTF-8
    Return InputString.ConvertEncoding(Encodings.UTF8)
  End If
  
  // Encoding is already UTF-8
  Return InputString
End Function
1 Like

Another idea is to subclass JSONItem and override its methods to define the encoding on incoming strings with nil encoding.

2 Likes

This all kind of sucks.
Kem, I implemented something like your suggestion and it works fine. Thanks.

I wouldn’t use a function; a method with passing the string by reference reduces the cost to the method call in the best case (the encoding is UTF8).

Public Sub ensure_utf8_encoding (byref InputString As String)
  If InputString.Encoding = Nil Then
    // Encoding is unknown to the application but known by the developer
    InputString = InputString.DefineEncoding(Encodings.UTF8)
  End If
  
  If InputString.Encoding <> Encodings.UTF8 Then
    // Encoding is not UTF8, so convert to UTF-8
    InputString = InputString.ConvertEncoding(Encodings.UTF8)
  End If
  
  // Encoding is already UTF-8
  // do nothing
End Sub

The code i posted was originally from the forum. I believe it was optimized since byref was slower.

I am missing a call to Encodings.UTF8.IsValidData function to check if given bytes are UTF-8.

This solution is probably the cleanest, but still forces me to go add extra .UTF8() calls all over the place. Any chance Xojo will be adding something like this to the JSON classes so they can assume UTF8 if there is no encoding defined?

1 Like

The problem with what you propose is that it would then fail silently somewhere else if the text wasn’t UTF8 compatible.

If there is no encoding defined, the VAST majority of the time it should have been UTF8. Leaving the JSON classes broken in this manner because the fix may cause a similar issue for a very small minority of cases (when the string is not UTF8 compatible) is … short sighted.