JSONItem replacement

Kem_Tekinay · August 24, 2014, 5:26am

Now version 2.5.

I realized that my class was not validating hex in a “\uNNNN” structure, so you could load something like ["\\ujohn"] and it would accept it without complaining. No more.

Along the way, I also made loading such encoded characters faster by directly writing the UTF-8-encoded bytes rather than converting to a string first.

@Greg O’Lone, I’ll file a FR, but FYI, the native class does not validate hex either.

Jeremy_Cowgar · August 24, 2014, 12:52pm

Seems the this class is not only a TON faster, but also much more proper than the native JSONItem. With no options, generates the exact same JSON as the native and w/other options generates a more strict JSON and is a drop in replacement (API same) as the native JSON.

Any chance @Greg O’Lone, that Xojo will simply pickup this JSON up for inclusion in Xojo instead of the one we have now? No reasons to fix the bugs discovered in the internal JSON. Just run the internal unit tests on this one, verify some critical output/input needed for the web framework and give it a go!

J_Andrew_Lipscomb · August 24, 2014, 2:48pm

While we’re at it, would it be useful to add the “not-quite” UTF-8 to the Encodings module? (For those unfamiliar, it’s called CESU-8; characters in the astral planes are represented by first converting the character to a surrogate pair (as in UTF-16), then encoding each surrogate in the manner of UTF-8.)

Greg_O_Lone · August 24, 2014, 5:46pm

Replacing one library with another is never simple. We use JSONItem extensively in Xojo Cloud and we would have an absolute disaster on our hands if there was a bug that Kem’s unit tests (or ours) didn’t catch.

I think the best answer we can give is, once this code settles down, we’ll see.

Kem_Tekinay · August 24, 2014, 6:05pm

Greg, if you have unit tests that you can send me, I’ll apply them against my class too.

Kem_Tekinay · August 24, 2014, 9:54pm

Now 2.6, and I think I’m done for a while.

This version will poke the character bytes directly rather than going through a string when it encounters a multi-byte character or one that needs to be encoded for Javascript compatibility or when told to encode all.

(UTF-8 byte-level manipulation is always fun. )

The only other change or addition I’d expect at this point is if/when I get additional unit tests to apply.

Kem_Tekinay · August 25, 2014, 2:32pm

I’ve included those unit tests, and they all pass. There was no change to the class code so I just replaced the release and did not update the version number.

Kem_Tekinay · August 25, 2014, 10:24pm

Famous last words…

Now at 2.7. This has better Strict checking of numbers (“2.e5” will not be allowed with Strict, for example) and an IsLoading method that will report if the object is currently loading. The latter is for threads because, now, if you try to load into the same object from a second thread, you will get an exception.

Most importantly, I dealt with aborted loads. Before, if you tried to load from a string with a bad value, it would load everything until that bad value. Now, the object will revert to the point where you started if the load is aborted.

Finally, if a string contains a control character or escaped random character ("\w"), it will be accepted UNLESS Strict = True.

Greg_O_Lone · August 26, 2014, 3:08am

I’m confused. The spec defines number as:

int
int frac
int exp
int frac exp

Why wouldn’t 2.e5 be valid?

Tim_Hare · August 26, 2014, 3:13am

It looks malformed to me. Should be 2.0e5 or just 2e5.

Jeremy_Cowgar · August 26, 2014, 3:17am

2.e5 is invalid according to spec, the parsing diagram says if decimal point is included, it must be followed by a number.

Other languages (for example):

// JavaScript
> JSON.parse('{"wage":2.e5}')
SyntaxError: Unexpected token e
    at Object.parse (native)
    at repl:1:7
    at REPLServer.self.eval (repl.js:110:21)
    at Interface.<anonymous> (repl.js:239:12)
    at Interface.EventEmitter.emit (events.js:95:17)
    at Interface._onLine (readline.js:202:10)
    at Interface._line (readline.js:531:8)
    at Interface._ttyWrite (readline.js:760:14)
    at ReadStream.onkeypress (readline.js:99:10)
    at ReadStream.EventEmitter.emit (events.js:98:17)
> JSON.parse('{"wage":2.0e5}')
{ wage: 200000 }

// Python
>>> import json
>>> json.loads('{"wage":2.e5}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 10 (char 9)
>>> json.loads('{"wage":2.0e5}')
{u'wage': 200000.0}

// Ruby
irb(main):005:0> JSON.parse('{"wage":2.e5}')
JSON::ParserError: 757: unexpected token at '{"wage":2.e5}'
	from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/json/common.rb:155:in `parse'
	from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/json/common.rb:155:in `parse'
	from (irb):5
	from /usr/bin/irb:12:in `<main>'
irb(main):006:0> JSON.parse('{"wage":2.0e5}')
=> {"wage"=>200000.0}

Kem_Tekinay · August 26, 2014, 3:17am

We (meaning with the help of @Jeremy Cowgar) tested against other implementations to be sure, and an integer-dot-exp comes back as an error. Should be int-exp-int or int-dot-int-exp-int.

Jeremy_Cowgar · August 26, 2014, 3:19am

Can’t edit. Meant to say, section 8 of spec says “It may have a . (U+002E) prefixed fractional part.” but the parsing diagram makes this more clear (to me at least).

Alwyn_Bester · August 26, 2014, 6:57am

Just an option (with the best of intention to help everyone, or to be corrected if needed) on duplicate keys, because I think this is a very important concept to consider before you start designing your awesome JSON data sets.

I work with a lot with JSON on three different platforms, Xojo, VB6 and JavaScript, to represent and transport very complex on/between completely different platforms. JSON is an exceptionally great specification to represent key:value pairs in xplat environments.

Having duplicate keys is an bad idea IMO, and a parser should not allow duplicate keys. If you need to use duplicate keys, then you really should look at reworking your data schema using arrays.

And example…

{
    "a" : "x",
    "a" : "y"
}

should rather represented stored as…

{
    "a" : ["x", "y"]
}

Even if some languages allows for duplicate keys, not all languages do, and you will eventually bump your head in frustration when you start transmitting your data to other platforms, if you have duplicate keys in your schema.

Would like to hear if you agree or disagree on this?

Alwyn_Bester · August 26, 2014, 7:00am

Pardon all the typos… forum doesn’t allow me to edit… so here goes…

Just an opinion…

I work a lot with JSON on…, to represent and transport very complex DATA on/between completely different platforms.

Alwyn_Bester · August 26, 2014, 7:01am

should rather be stored as…

Kem_Tekinay · August 26, 2014, 7:21pm

Now 2.8.

I realized the native class ToString is a computed property, not a method, and there is a method called Serialize(data as JSONItem) As String that handles the actual work. I’m not sure if the latter is supposed to be exposed, but whatever, I changed my class to match. ToString is now a computed property so it can be examined in the debugger, and Serialize can be called with a JSONItem_MTC if you prefer (but I recommend against and will hide or change this if the native is changed).

No functional changes in this one.

Kem_Tekinay · August 21, 2015, 8:45pm

I just updated to v.2.81 and that fixes a bug in the Remove method, among other things. (Thanks marco-at-citec.)

Kem_Tekinay · December 13, 2015, 10:06pm

I just released v.2.9. This adds support for Text (something the native version does not support as I write this) and updates the tests for 64-bit.

Kem_Tekinay · May 1, 2016, 2:06pm

I just released v.3.0 of JSONItem_MTC, the drop in replacement for the classic framework JSONItem.

This version does a few things. First, I did away with the pragmas to disable background tasks and checks. Why? First, it didn’t make much difference in speed anyway. Second, it would tie things up when encoding large JSON.

Second, I made memory usage more efficient when turning the JSON into a string. Rather than using one ever-expanding MemoryBlock, it creates a series of MemoryBlocks to hold the output and joins them at the end. As such, it can serialize a 500 MB JSON without issue. (Larger strings will give Xojo fits, it seems.)

I also added a few more tests to the harness project.

If you’re using JSONItem, I encourage you to consider this instead. JSONItem has a few issues that have been solved by this class, as illustrated by the included unit tests. (See the README for some of the details.) And it’s faster to boot.

Edit: Oh yeah, I also added an Operator_Subscript so you can both assign and retrieve values from an Array-type JSON with a simple index. For example:

dim j as new JSONItem_MTC( "[1, 2]" )

dim i as integer = j( 0 )
j( 1 ) = true