JSONItem encoding issue in 2021 releases

I wouldn’t use a function; a method with passing the string by reference reduces the cost to the method call in the best case (the encoding is UTF8).

Public Sub ensure_utf8_encoding (byref InputString As String)
  If InputString.Encoding = Nil Then
    // Encoding is unknown to the application but known by the developer
    InputString = InputString.DefineEncoding(Encodings.UTF8)
  End If
  
  If InputString.Encoding <> Encodings.UTF8 Then
    // Encoding is not UTF8, so convert to UTF-8
    InputString = InputString.ConvertEncoding(Encodings.UTF8)
  End If
  
  // Encoding is already UTF-8
  // do nothing
End Sub

The code i posted was originally from the forum. I believe it was optimized since byref was slower.

I am missing a call to Encodings.UTF8.IsValidData function to check if given bytes are UTF-8.

This solution is probably the cleanest, but still forces me to go add extra .UTF8() calls all over the place. Any chance Xojo will be adding something like this to the JSON classes so they can assume UTF8 if there is no encoding defined?

1 Like

The problem with what you propose is that it would then fail silently somewhere else if the text wasn’t UTF8 compatible.

If there is no encoding defined, the VAST majority of the time it should have been UTF8. Leaving the JSON classes broken in this manner because the fix may cause a similar issue for a very small minority of cases (when the string is not UTF8 compatible) is … short sighted.

If no encoding is defined on a string, the MBS Plugin will treat it as MacRoman on macOS or Windows ANSI on Linux/Windows automatically as fallback. No exceptions needed.

1 Like

A reasonable compromise might be for JSONItem (and ParseJSON) to test for valid UTF-8 where the encoding is nil, and only raise the exception when it’s not.

5 Likes

I agree with Kem!

2 Likes

Our legacy code then is fragile, makes it hard to feel confident updating to 2021 since JSONitem is performing differently now than expected and can cause crashes if encoding is not defined and not assumed. My app crashes after compiling for any 2021 build as of right now. I can identify where, that’s not the issue. But now we have to define encoding every time we want to assign a string to a JSONitem from something like a RowSet or it crashes.

That said, on the plus side the new JSONitem is noticeably faster! I hope there is a way to solve this that doesn’t involve refactoring hundreds of JSONitem lines of code.

So now will someone explain if I need to change anything when switching my project from 2021r1.1 to the latest? If I am trapping JSON errors will they fail silently unless I change my code?

The function identified in this thread does resolve this encoding problem.

In my case with a MySQL table set with Table collation = UTF8 still fails when needing to use JSONItem. When I do the JSOItem ToString is always blank due to encoding. Using the return from a RowSet works in other needs such getting the integer or string value without a problem requiring a test or setting of encoding.

I agree with many on this thread the way JSON now works requires additional coding and that additional work while not extensive identifies a significant departure from previous JSONItem implementation that affects a majority for a minority need.

From a XOJO perspective how to resolve is a good question

i don’t see an issue in there, a xojo string by default is utf-8 but if it comes from the outside of xojo you always need to make sure it’s the expected encoding. If you never did, you created something that “may” get errors or encoding issues as you see now (since it’s now more strict).

In this case i think xojo is right, since may problems come from wrong usage of strings (unencoded or undefined encoding).

@Carl_Fitzsimmons
Check the encodings (in the debugger or so) of the string values of the keys, you may have a single one that has a wrong encoding ?

I always pass UTF8 strings on the web. And always get them with the “unknown” encoding on the other side, so a just set the entire JSON string as utf8 when receiving it and decode it after with no problems at all. For me, the MBS fallback would be an error, working sometimes, and sometimes failing if I forgot setting the correct encoding when receiving the data, so I just prefer an exception.

That’s why we use subclasses for all Controls, but also for quite some Framework Classes such as JSONItem…
Then you can check/define the Encoding in a single place, use conditional compilation for behavior changes between Xojo versions, …

2 Likes

Hi @Jürg_Otter , your comment to subclass this function is terrific. A lot of people could benefit from this reminder, which I often forget myself. The team at Xojo should ask you to throw together a blog post and example of how you use a Framework Subclass like this. It’s not something a lot of people know how to do, and it’s such a life saver when it comes to applying behavior across your whole project. I’d love to see how you tackle a JSONItem like this!

1 Like

Subclassing unnecessarily things is a bad practice and bad design, and must be avoided. Subclass things as you go, when needing it. As a large inheritance chain must be avoided by design, unnecessarily designing a large inheritance chain is the worst of the worlds.

I assume somewhere in there is some balanced advice, such as when it’s a great idea, and when it’s otherwise creating a huge tax on the system. Inheritance is a powerful tool in the right hands. I’m always eager to learn something new, or challenge what I thought I already knew. As always I appreciate the point counter-point dialog in these exchanges.

Good advice:
Make a good, previously studied design of something and descendants. Implement them and use them. In the future, needing something special from some of them, subclass them and use them.

Bad advice:
Subclass everything you see around and use them, if by any chance in the future you need to implement a change, modify the vacant and preallocated useless position in the inheritance chain with the new functionalities affecting even parts that currently don’t need them.

Another approach to have a more centralized “fix” is to write an own Extension. E.g.:

Public Sub ValueWithCheckedEncoding(Extends j As JSONItem, key As String, assigns Value As Variant)
  If (Value <> Nil) And (Value.Type = Variant.TypeString) Then
    Value = EncodingCheckedString(Value)
  End If
  j.Value(key) = Value
End Sub

(code above is just as an idea… the code for the Method EncodingCheckedString is not pasted there)

This requires to replace all calls from e.g. oJSON.Value("newKey1") = sValue to oJSON.ValueWithCheckedEncoding("newKey1") = sValue
At least AutoComplete will show that there is more than 1 way of “.Value” available.

The SubClass approach I previously suggested fits the “good advice” of Rick. You need some special handling (as of 2021r1). By creating a Subclass (e.g. JSONItemSC), you can add the Encoding-Check in the Subclass’ method .Value.
This requires to replace new JSONItem with new JSONItemSC, and maybe a couple more.

Another benefit of a SubClass for a “Core Functionality Class” is that you’d have the choice to switch it to use a Plugin Class internally, e.g. JSONMBS. An example is provided in the MBS Plugins: JSONItem clone.xojo_binary_project (see class JSONItem_MBS, which you could rename to JSONItemSC). Or of course write your very own implementation, such as the well-known JSONItem_MTC.

That allows your code using JSONItemSC not needing any changes when adding a fix or behavior change (which can be implemented using conditional compilation to kick in only for certain Xojo Versions, what makes it even more interesting for shared code between various projects that are being built with different Xojo Versions).

I totally agree with that… I wouldn’t want a “blank subclass with no code” just for the sake of having one.

1 Like