JSONItem replacement

Kem_Tekinay · August 15, 2014, 4:51am

@Christian Schmitz, I ran those speed tests. They are in a separate branch on the repo. As you can see, except for creation, your plugin is faster than either the native or my class.

The “but they don’t match” messages are because the native will encode some Unicode characters. I don’t know the reason why it picks those. The output from the MBS plugin matches the output from my class with EncodeUnicode set to false.

All times are in microseconds.

JSONItem Create: 13,032
JSONItem_MTC Create: 6,422
JSONMBS Create: 14,082
JSONItem.ToString: 240,627
JSONItem_MTC.ToString: 23,235
JSONMBS.ToString: 2,818
JSONItem.Load: 145,182
JSONItem_MTC.Load: 45,637
JSONMBS.Load: 1,862
JSONItem.Create (big): 24,989
JSONItem_MTC.Create (big): 32,765
JSONMBS.Create (big): 241,766
JSONItem.ToString (big): 4,557,413
JSONItem_MTC.ToString (big): 78,576
... but they don't match
JSONMBS.ToString (big): 4,094
... but it doesn't match JSONItem
JSONItem.Load (big): 322,309
JSONItem_MTC.Load (big): 183,231
JSONMBS.Load (big): 5,061

Kem_Tekinay · August 15, 2014, 4:57am

(For my European friends, remember that “13,032” means thirteen-thousand, thirty-two microseconds, not 13 and 32/1000 microseconds. In the “big” test, the native class ToString takes around four-and-a-half seconds.)

Brock_Nash · August 15, 2014, 5:07am

My serialization class uses JSON. It might be worth converting the library to using this JSON class instead.

Peter_Fargo · August 15, 2014, 8:53am

Thank you Kem. I’ll be converting a project later today to test.

Kem_Tekinay · August 16, 2014, 5:35pm

I’ve updated the project. My class will now accept any escaped character in a JSON string (which it may according to the RFC), added pragmas for some additional speed, and added code to better handle encoding when loading a JSON string. It will figure out the correct encoding (ignoring whatever encoding that’s already set) and even strip any BOM that might be present.

Kem_Tekinay · August 17, 2014, 4:10am

I’m about to update again. This time I’ve made it so that all output is through a MemoryBlock and Ptr instead of an array. The latest numbers are improved, although MBS still blows it away for most operations.

In a compiled app:

JSONItem Create: 12,301
JSONItem_MTC Create: 6,481
JSONMBS Create: 14,421
JSONItem.ToString: 250,623
JSONItem_MTC.ToString: 17,857
JSONMBS.ToString: 3,131
JSONItem.Load: 144,455
JSONItem_MTC.Load: 46,604
JSONMBS.Load: 2,220
JSONItem.Create (big): 27,334
JSONItem_MTC.Create (big): 33,889
JSONMBS.Create (big): 241,615
JSONItem.ToString (big): 4,624,553
JSONItem_MTC.ToString (big): 45,137
... but they don't match
JSONMBS.ToString (big): 4,182
... but it doesn't match JSONItem
JSONItem.Load (big): 328,129
JSONItem_MTC.Load (big): 186,690
JSONMBS.Load (big): 4,642

All times are in microseconds, and there are no decimals.

Again, the “but it doesn’t match” message is because JSONItem encodes some Unicode characters while neither JSONItem_MTC nor JSONMBS does. The output is valid on all cases.

Norman_P · August 17, 2014, 4:15am

“but it doesn’t match” may also mean that your software breaks quietly IF you happen to rely on the behavior of the built in JSON item

Test thoroughly if you switch

Kem_Tekinay · August 17, 2014, 4:19am

Always good advice.

If I knew why JSONItem encoded those particular characters, I’d emulate that, but I can’t find a good reason in the RFC or other sources. It would help to hear the reasoning from some engineer.

In the meantime, you can choose to encode all Unicode characters (code point > 127) by setting EncodeUnicode to true before calling ToString.

Norman_P · August 17, 2014, 4:29am

From what I can see

// known entities
\ -> \\
" -> "
chr(8) -> \b
chr(12) -> \f
chr(10) -> \

chr(13) -> \r
chr(9) -> \t

// ascii control characters and unicode chars that javascript can’t handle
// since we use this with web edition code it HAS to work in java script too
chr 0-31
chr 127
&h00ad
&h0600 - &h0604
&h070f
&h17b4 - &h17b5
&h200c - &h200f
&h2028 - &h202f
&h2060 - &h206f
&hfeff
&hfff0 - &hffff

and optionally / turns into \/

Kem_Tekinay · August 17, 2014, 4:33am

Awesome, I’ll make that change. Curious, where did you get that list?

Kem_Tekinay · August 17, 2014, 5:23am

That change has been made a posted. It slows the class down a bit, but there is not much to be done about that. New numbers:

JSONItem Create: 12,740
JSONItem_MTC Create: 6,101
JSONMBS Create: 14,254
JSONItem.ToString: 240,259
JSONItem_MTC.ToString: 17,004
JSONMBS.ToString: 3,074
JSONItem.Load: 143,632
JSONItem_MTC.Load: 46,863
JSONMBS.Load: 2,079
JSONItem.Create (big): 24,267
JSONItem_MTC.Create (big): 31,055
JSONMBS.Create (big): 243,879
JSONItem.ToString (big): 4,585,982
JSONItem_MTC.ToString (big): 53,308
JSONMBS.ToString (big): 4,069
... but it doesn't match JSONItem
... but it doesn't match JSONItem_MTC
JSONItem.Load (big): 346,289
JSONItem_MTC.Load (big): 184,682
JSONMBS.Load (big): 4,930

Kem_Tekinay · August 17, 2014, 5:26am

BTW, I added a unit test to confirm that all characters from &h0000 through &hFFFF are represented the same as the native class. Now I just have to figure out the right way to handle encoding and reading of characters > &hFFFF when EncodeUnicode is true.

Greg_O_Lone · August 17, 2014, 12:12pm

Trial and error.

ChristopheDV · August 17, 2014, 12:19pm

+1

Kem_Tekinay · August 17, 2014, 1:26pm

The link again.

https://github.com/ktekinay/JSONItem_MTC

Kem_Tekinay · August 21, 2014, 6:21am

I just updated the project to v.2.1 Loading from a JSON string is now quite a bit faster, and I included some other optimizations too.

The latest test numbers:

JSONItem Create: 6,812
JSONItem_MTC Create: 5,482
JSONMBS Create: 12,514
JSONItem.ToString: 209,170
JSONItem_MTC.ToString: 13,155
JSONMBS.ToString: 2,817
JSONItem.Load: 115,984
JSONItem_MTC.Load: 11,637
JSONMBS.Load: 2,190
JSONItem.Create (big): 38,368
JSONItem_MTC.Create (big): 36,818
JSONMBS.Create (big): 209,415
JSONItem.ToString (big): 4,127,084
JSONItem_MTC.ToString (big): 164,012
JSONMBS.ToString (big): 3,504
... but it doesn't match JSONItem
... but it doesn't match JSONItem_MTC
JSONItem.Load (big): 690,745
JSONItem_MTC.Load (big): 104,617
JSONMBS.Load (big): 4,284

Note: These numbers cannot be compared to earlier numbers as I changed the test to make it a bit more complex.

scott_boss · August 21, 2014, 1:37pm

[quote=120905:@Jeremy Cowgar]In Python and Ruby, the last man wins. i.e.

{ “a”: “123”, “a”:“456”, “a”:“789” }

When you access “a”, you’ll get 789.[/quote]

this true with many languages, especially the “scripting” based ones.

Greg_O_Lone · August 21, 2014, 2:59pm

Being that it’s JavaScript Object Notation, I’d bet this was the original intent.

Kem_Tekinay · August 23, 2014, 3:23am

I just updated the project to v.2.3. In the latest, doubles with the values INF or NAN will be output that way regardless of the format string. This is a break with both the native class and the JSON RFC which does not allow those values, so

I introduced a Strict property. By default, it is set to False, but when set to True, the class conforms strictly to the RFC. This means values like “TRUE”, “+1”, and “inf” will be rejected, and doubles that are either INF or NAN will raise an exception rather than being output. With Strict turned on, you can use JSONItem_MTC as a JSON validator, and the string output should be valid with all other JSON implementations.

Kem_Tekinay · August 23, 2014, 5:35am

Now updated to 2.4. The class will now properly decode and (if EncodeType.All) encode characters whose code point > &hFFFF.

@Greg O’Lone, I wanted to call your attention to this one as the native class does not handle this properly. I included a unit test within the project to demonstrate, and will find or start a FR soon.

As a test, use the JSON string:

["\\uD834\\uDD1E"]

That should decode to &u1D11E, but instead gives an invalid string.