Xojo JSON default handling of Reals could be better

  1. 2 weeks ago

    Karen A

    Jul 5 Pre-Release Testers
    Edited 2 weeks ago

    I deal with a lot of numerical (real) data who's values can have very different magnitudes and numbers of significant figures.

    I wanted to see how the Xojo JSON options would handled them.
    Bottom line: Given that JSON String size often matters, and Xojo is a strongly typed language it should be better.

    First lets take a look at how Xojo.Core.GenerateJSON does

    For for the first test I used:
    D as Double = 1.0/3.0 And got this:
    {"Xojo.JSON Double":0.33333333333333331483}

    Next I tried:
    D as Double = 1e100*(1.0/3.0) And got:
    {"Xojo.JSON Double":3.3333333333333332245e+99}
    A few extra insignificant digits (should only be 16 or 17) but not too bad, and it did switch to scientific notation when it needed to

    But then I went to singles:
    S as Single = 1.0/3.0 And that gave:
    {"Xojo.JSON Single":0.3333333432674407959}

    S as Single = 1e35*(1.0/3.0) Gave:
    {"Xojo.JSON Single":3.3333333046695906274e+34}

    For singles there are WAY too many digits so potentially a lot of wasted space!!!!
    A single only has about 8 digit meaningful significance!!! As i said Xojo is strongly typed to it knows what type of real umber it is and how many significant digits may sense!!!!!

    And unlike JSONItem as far as I can see you cannot specify numeric format... which can be important in situations where you don't need all this significant figures!

    In any case intelligent defaults are important to able to handle the general case efficiently!!!!!

    Speaking of JSONItem it is a much worst offender (though you can define the numeric format to use - but as I said smart defaults are important)

    D as Double = 1.0/3.0 Gave:
    {"JI_DefaultDouble":0.333333333333333}
    Not too bad but, it is 1 or 2 significant figures TOO SHORT. That can matter in some things!

    D as Double = 1e100*(1.0/3.0) gave:
    {"JI_DefaultDouble":3333333333333333224453896013722304246165110619355184909726539264904319486405759542029132894851563520.0}
    Now we are into the ridiculous in terms of digits...
    It should have switched to scientific notation and used the correct number of sig figs for a double!

    Same thing for singles:
    S as Single = 1e35*(1.0/3.0) Gave:
    {"JI_DefaultSingle":33333333046695906273818073259573248.0}
    One heck of a lot more than 8 sig figs!!!!!

    Yes we can set our own format but when one writing what is supposed to general use code most won't even thing about that!

    JSONitem_MTC behaves the same way but can be modified as we have the source code

    - Karen

  2. Christian S

    Jul 5 Pre-Release Testers, Xojo Pro, XDC Speakers Germany

    Did you try MBS Xojo Util Plugin and JSONMBS class?

  3. last week

    Karen A

    Jul 6 Pre-Release Testers

    @ChristianSchmitz Did you try MBS Xojo Util Plugin and JSONMBS class?

    No. As i said in the other thread as I don't have a license I can't test performance a compiled app so i did not see the point.

    - karen

  4. Karen A

    Jul 6 Pre-Release Testers
    Edited last week

    On significant figures and JSON:

    For a double from Wikipedia:
    https://en.wikipedia.org/wiki/Double-precision_floating-point_format

    The 53-bit significand precision gives from 15 to 17 significant decimal digits precision (2−53 ≈ 1.11 × 10−16). If a decimal string with at most 15 significant digits is converted to IEEE 754 double-precision representation, and then converted back to a decimal string with the same number of digits, the final result should match the original string. If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.[1]

    So when converting a double to JSON there is no point in having more than 17 significant figures- maybe 18 just in case (and there should not be less than 17 unless the user specifies it!)... Having more just make the the JSON payload larger .

    For Singles from Wikipedia:
    https://en.wikipedia.org/wiki/Floating-point_arithmetic

    Single precision, usually used to represent the "float" type in the C language family (though this is not guaranteed). This is a binary format that occupies 32 bits (4 bytes) and its significand has a precision of 24 bits (about 7 decimal digits).

    Since it says "about" I would say it should be 8 significant figures unless the user specifies a format using less...

    If an application needs to transfer a lot of numerical data this can matter for payload size.

  5. Kem T

    Jul 6 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut

    Where that's a concern, perhaps convert it to a formatted string instead?

  6. Karen A

    Jul 6 Pre-Release Testers

    @Kem T Where that's a concern, perhaps convert it to a formatted string instead?

    Kem,

    That loses the meta info on datatype that JSON is supposed to have.

    in any case we should not have to worry about if REAL numbers are being represented efficiently when using built-in stuff to generate JSON... which is usually used over XML for compactness...

    Also having to know about the specific use case is an issue for writing efficient general use code... The built in stuff should just do this optimally by default...

    BTW this is an easy fix for JSONItem_MTC, and if Xojo does adopt your code i hope they optimize that aspect for the internal version.

    Not so for Xojo.Data.GenerateJSON, but there, while doubles have an extra 2 or 3 digits, the issue is only really only significant for singles (Well unless you WANT to use fewer sig figs as there is no way to set it).

    But, besides using the painful Xojo.framework, Xojo.Data.GenerateJSON is deprecated!

    The Sig Fig stuff is the kind of thing that can matter for scientific and engineering applications but not be much of a concern for other types.

    Can you tell that I am detail oriented person! ;)

    BTW for my use I think I will use Xojo.Data to generate the JSON, but on windows at least, based on Thom's testing I will probably use JSONItem_MTC for deserializing to get maximum performance without plugins.

    - Karen

  7. Kem T

    Jul 6 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut

    FYI (and I had forgotten this), JSONItem_MTC already has a DecimalFormat property that will let you set the number of significant decimals, and defaults to "-0.0##############". For very large numbers you can set the DecimalFormat to scientific notation if you prefer, but either way, changing the source is not needed.

  8. Karen A

    Jul 6 Pre-Release Testers
    Edited last week

    @Kem T FYI (and I had forgotten this), JSONItem_MTC already has a DecimalFormat property that will let you set the number of significant decimals, and defaults to "-0.0##############". For very large numbers you can set the DecimalFormat to scientific notation if you prefer, but either way, changing the source is not needed.

    I know I am being very picky here and I don't mean this personally , but I disagree strongly... It should behave in that respect more like Xojo.Data.generateJSON (except that does not deal with singles as singles either)

    The optimal DEFAULT behavior should depend on if it is a single or a double and be the # of significant figures that insures no information is lost (unless the user specifies less digits), but ALSO not more digits than that, as size matters for JSON and those extra digits contribute nothing but unneeded extra bytes to send.

    Do you really disagree with that in principle?

    As I see it, optimal behavior for the general case is by definition NOT case specific and should not require a 'special' format. That setting is for case specific usage. With JSONItem_MTC that is not critical as we have the source and can make it work optimally, and right now it makes more sense to use JSONItem_MTC than JSONItem. But if Xojo inc adopts the code it really should fix that.

    In any case if I use JSONItem_MTC for serialization I will make sure it works that way at least for myself (If Xojo Inc includes it, the speed likely would not be any different in complied apps than using your code).

    I agree that most of the time, for most people it does not matter much... but why not have it work the best it can when it's not hard to do that?

    As I said I am very detail oriented person, and being a scientist I deal with real numbers a lot!

    -Karen

  9. James S

    Jul 6 Pre-Release Testers, Xojo Pro

    There is a difference between making a report about how something should work and offering suggestions about how to work around it. We all need to make stuff that works and that means work arounds to get it going. That should not stop us from making bug reports when we have had to do so. We can’t all go to our managers or CO’s and tell them, well sorry, just doesn’t work :) You have to make it work so the discussion of how else to make it happen is welcome. They don’t solve the underlying problem that we shouldn’t have to do that. Those still need to be reported!

  10. Dave S

    Jul 6 San Diego, California USA
    Edited last week

    @Karen A and Xojo is a strongly typed language it should be better.

    But Xojo is NOT a "strongly typed" language... .it is rather weakly typed. at least in my opinon.

    To me, if Xojo were strongly typed you would not be allowed to say

    Dim dbl as DOUBLE
    Dim Int as INTEGER
     int=dbl 
    dbl=int

    without Casting

    Swift *IS* Strongly typed....

    var dbl : Double var Int : Int int=Int(dbl) dbl=Double(int)

    Strong and weak typing[edit]
    Main article: Strong and weak typing
    Programming languages are often colloquially classified as strongly typed or weakly typed (also loosely typed) to refer to certain aspects of type safety. In 1974, Liskov and Zilles defined a strongly-typed language as one in which "whenever an object is passed from a calling function to a called function, its type must be compatible with the type declared in the called function."[5] In 1977, Jackson wrote, "In a strongly typed language each data area will have a distinct type and each process will state its communication requirements in terms of these types."[6] In contrast, a weakly typed language may produce unpredictable results or may perform implicit type conversion.[7]

  11. Kem T

    Jul 6 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut

    I favor flexibility, and my code offers that. That was my only point.

  12. Karen A

    Jul 6 Pre-Release Testers
    Edited last week

    @Kem T I favor flexibility, and my code offers that. That was my only point.

    Kem,

    In that we are in violent agreement about that! ;)

    My point was only about what JSONItem should do when the format is not set. Changing that requires changing the source code.

    While what is does by default is better, IMO it's unfortunate that Xojo.Data.GenerateJSON does not allow setting the format as well.

    - karen

  13. Greg O

    Jul 6 Xojo Inc
    Edited last week

    @Karen A I know I am being very picky here and I don't mean this personally , but I disagree strongly... It should behave in that respect more like Xojo.Data.generateJSON (except that does not deal with singles as singles either)

    The optimal DEFAULT behavior should depend on if it is a single or a double and be the # of significant figures that insures no information is lost (unless the user specifies less digits), but ALSO not more digits than that, as size matters for JSON and those extra digits contribute nothing but unneeded extra bytes to send.

    Do you really disagree with that in principle?

    As I see it, optimal behavior for the general case is by definition NOT case specific and should not require a 'special' format. That setting is for case specific usage. With JSONItem_MTC that is not critical as we have the source and can make it work optimally, and right now it makes more sense to use JSONItem_MTC than JSONItem. But if Xojo inc adopts the code it really should fix that.

    In any case if I use JSONItem_MTC for serialization I will make sure it works that way at least for myself (If Xojo Inc includes it, the speed likely would not be any different in complied apps than using your code).

    I agree that most of the time, for most people it does not matter much... but why not have it work the best it can when it's not hard to do that?

    As I said I am very detail oriented person, and being a scientist I deal with real numbers a lot!

    -Karen

    Part of the problem here is that the JSON spec isn't written that way. All it says is for a number type is "integer fraction exponent".

    https://www.crockford.com/mckeeman.html

    Trying to force this kind of restriction wouldn't help you when exchanging data with another system and may actually hurt you if the other system always uses doubles. Remember, JSON is for data transfer, not data presentation.

    As was mentioned before, if you require a specific number of decimal places, your best bet is to use a String and format it the way you want it to be.

  14. Karen A

    Jul 6 Pre-Release Testers
    Edited last week

    @Greg OLone Part of the problem here is that the JSON spec isn't written that way. All it says is for a number type is "integer fraction exponent".

    I'm not sure what you mean by that...

    Are you saying we should not have the option to set a format, or are you saying there is an issue with what I was saying about sig figs?

    If it's the former I agree that a format strings gives TOO much flexibility , but that does not mean we don't need ANY option to affect how it is formatted...

    I was not more specific earlier because JSONItem used a format string... While i knew that could be problematic I did not want to go too far into the weeds... Well maybe it is weed time! ;)

    I think we still should be able to specify the number of significant figures to use and internally, and that the JSON generator should ensure that given the number of sig figs, the number is written in the most compact standard form possible... switching to scientific notation when, for that number of significant figures. that would be the smallest form...

    My main argument is that the DEFAULT # of sig figs should be about 17 for a double and 8 for a single... We should not be able to go beyond what the datatype can do, but an option to set it to use fewer sig figs makes a lot of sense and should not cause an issue for other systems... and it is NOT about presentation

    I will get to my explanation for that last...

    Trying to force this kind of restriction wouldn't help you when exchanging data with another system and may actually hurt you if the other system always uses doubles.

    What restriction are you talking about? I'm confused.

    The JSON spec does not require a specific number of significant figures, so any # of sig figs that make sense for numbers in the range of a valid double should not be an issue on any system. Also, given that using fewer sig figs for single than a double would never be an issue... just as assigning a single variable value to a double variable can never be an issue. (though the other way around can be)

    Remember, JSON is for data transfer, not data presentation.

    That statement makes me think we are not taking about the same thing. There is a real disconnect here... I think you may be missing my point, and I yours.

    SIG figs are NOT about presentation at all, they are about numerical/mathematical/physical significance...

    You know that for example , for a single written with more than 8 significant figures the additional digits are meaningless because 8 sig figs is most the binary representation can represent. Having more won't cause an overflow or an underflow but including them has no benefits and in fact is a negative for payload size. For a double that is about 17...

    Are we on the same page so far?

    JSON is valued for it's compactness, even though the spec itself is not that detailed, why include more significant figures than the numeric datatype can represent? So why include more than about 17 sig figs for a double and 8 for a single? How can following that cause an issue for any system? Including more just bloats the payload!

    As I said Xojo.Data.GenerateJSON is pretty reasonable for a double (could use to be a digit or 2 shorter) but has way too many sig figs for a single.

    JSONItem is potentially a REALLY bad actor here... as I showed it can output HUGE numbers of digits way beyond those needed to represent 17 sig figs (over 100!), as it does not switch to scientific notation!!!

    As was mentioned before, if you require a specific number of decimal places, your best bet is to use a String and format it the way you want it to be

    Talk about a non standard way of dealing with other systems!!!!! ;)

    Let me explain why we would still want to represent a real with fewer significant figures than the type is capable of representing, but still be able to have it type as a number for other systems with a hypothetical example.

    I have instrumentation that can report over 100 data point per second (not unusual in a laboratory) and experiments can go for long times... The software used reports each point as a double but the physical reality is that anything more than 4 or 5 sig figs (at best!) is just meaningless noise because the the sensor doing the measurement can not measure more accurately...

    Lets say I need to transfer that data to a server using JSON to another standard software package that takes JSON input. That software only takes data as numbers and not strings.

    I could just send the full 19 0r 20 sig Figs that Xojo.Data.GenerateJSON produces but that would be a huge amount of text clogging the network for no reason... I might want to set the JSON to write the numbers using 5 significant figures which would reduce the payload size by over 8 MEGABYTES for 2 hours worth of data!!!!

    Does that make sense?

    - Karen

  15. Greg O

    Jul 6 Xojo Inc

    I do see where you are coming from and what you want from this, but JSON is JavaScript Object Notation. JavaScript didn’t have concepts like Integers, Singles or Doubles when JSON was created.

    @Karen A JSON is valued for it's compactness, even though the spec itself is not that detailed, why include more significant figures than the numeric datatype can represent? So why include more than about 17 sig figs for a double and 8 for a single? How can following that cause an issue for any system? Including more just bloats the payload!

    Just out of curiosity, what would you say to someone who wanted a number that approaches the largest (or smallest) number a single can represent?
    ±3.40282346638528859811704183484516925e+38
    8 significant digits may be enough for your purposes, but not necessarily for everyone.

  16. Karen A

    Jul 6 Pre-Release Testers
    Edited last week

    @Greg OLone I do see where you are coming from and what you want from this, but JSON is JavaScript Object Notation. JavaScript didn’t have concepts like Integers, Singles or Doubles when JSON was created.

    I know that, but Xojo does, and taking them into account when creating them in Xojo would not cause interoperability issues as the numbers would still be valid JSON numbers, and it would help keep payload size down...

    Can you tell me what the downside would be? As I said the JSON spec does not require a specific number of significant figures, compliant parsers should not have an issue!

    Just out of curiosity, what would you say to someone who wanted a number that approaches the largest (or smallest) number a single can represent?
    ±3.40282346638528859811704183484516925e+38
    8 significant digits may be enough for your purposes, but not necessarily for everyone.

    Hmmm i would say i don't think a single CAN be meaningfully that precise... That number of sig figs did not come from MY purposes, it came from this:
    For Singles from Wikipedia:
    https://en.wikipedia.org/wiki/Floating-point_arithmetic

    Single precision, usually used to represent the "float" type in the C language family (though this is not guaranteed). This is a binary format that occupies 32 bits (4 bytes) and its significand has a precision of 24 bits (about 7 decimal digits).

    2^24 = 16777216
    which is 8 significant figures.

    This Wikipedia page says:

    https://en.wikipedia.org/wiki/Single-precision_floating-point_format

    an IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2^−23) × 2^127 ≈ 3.4028235 × 10^38

    Edited to add But the I just found this:

    If an IEEE 754 single-precision number is converted to a decimal string with at least 9 significant digits, and then converted back to single-precision representation, the final result must match the original number

    So I guess it really should be 9 sig figs...

    Is that wrong or am I misunderstanding it? (i did not have the patience to slog through the math on the page!)
    Googling I also got results from StackOverflow like this answer:

    https://stackoverflow.com/questions/13542944/how-many-significant-digits-have-floats-and-doubles-in-java

    Float: 32 bits (4 bytes) where 23 bits are used for the mantissa (about 7 decimal digits). 8 bits are used for the exponent, so a float can “move” the decimal point to the right or to the left using those 8 bits. Doing so avoids storing lots of zeros in the mantissa as in 0.0000003 (3 × 10-7) or 3000000 (3 × 107). There is 1 bit used as the sign bit.

    double: 64 bits (8 bytes) where 52 bits are used for the mantissa (about 16 decimal digits). 11 bits are used for the exponent and 1 bit is the sign bit.

    Which is consistent with wikipedia... that is why I specified those limits.

    Also see this:

    https://en.wikipedia.org/wiki/IEEE_754-1985

    Again am I misunderstanding this ? (I very well could be)

    As for having the option to set it to fewer sig figs... maybe computer science types need to get out of the ivory tower more ! ;)

    - karen

  17. Christian S

    Jul 6 Pre-Release Testers, Xojo Pro, XDC Speakers Germany

    With JSONMBS , you get some auto formatting of numbers:

    e.g.

    Dim d As New Dictionary
    
    Dim d1 As Double = 1/3
    Dim d2 As Double = 1e100*(1.0/3.0)
    
    d.Value("test1") = d1
    d.Value("test2") = d2
    
    Dim j As JSONMBS = JSONMBS.Convert(d)
    Dim s As String = j.toString
    Break

    gives:

    {
    "test1": 0.333333,
    "test2": 3.333333e+99
    }

    so the plugin rounds doubles to 6 digits and may use scientific notation.
    Please also note that when we parse and output JSON, we keep whatever number value.

  18. Karen A

    Jul 7 Pre-Release Testers

    @ChristianSchmitz so the plugin rounds doubles to 6 digits and may use scientific notation.

    You may want to change that....

    IMO that is wrong for the default behavior... it means that the transported data has lost information... that should never happen UNLESS the user specifies it..

    Also he option to specify the # of significant figures should alway be there. How many significant figures make sense to use/send does vary by specific situation (see my example above) and can make significant differences in the size of the payload to be transfered ...

    But the default (which should be able to be changed) should be not to lose any information IMO.

    BTW if I am wrong about singles , the exact # was not my main point. It's that by default the MIMINUM # of sig figs needed to not lose information should be the DEFAULT (though I don't see how it can be the same as for a double given fewer bits for the mantissa).

    In any case there should be a way to specify fewer significant figures be used to allow optimal data transfer for specific situations as minimizing unnecessary net data transfer can be important. Not having that ability for reals is my primary issue with Xojo.Data.GenerateJSON (outside of the framework of course! ;) )...

    As for JSONItem, although it has a mechanism that can be used to specify sig sigs, depending on the exact value, it can be REALLY poor with it's default real number JSON output using MANY MANY extra digits.

    Please also note that when we parse and output JSON, we keep whatever number value.

    Christian, If I understand what you mean here, I don't get why that would matter.. of course the server has access to the original number... but the JSON string is what is sent to the client and that is all it has available.

    Anyway at this point no one is likely reading this anymore, i have likely fallen out of my tree and broken my arm, and annoyed most here (not my intension - I just want Xojo to be the best and most useful it can be) ... so i will give it a rest.

    I am sorry if I annoyed anyone (and I'm sure i have).

    - Karen

  19. Christian S

    Jul 7 Pre-Release Testers, Xojo Pro, XDC Speakers Germany

    It's even better. If you don't care, you get a good default.

    But you can use JSONMBS class.
    NewNumberNode can take a double or a string. And the string can be formatted with whatever format you need using str() function.

    We also have NewInt64Node and NewUInt64Node to use integers and avoid double rounding for big integers.

  20. Christian S

    Jul 7 Pre-Release Testers, Xojo Pro, XDC Speakers Germany

    Here an example for 9.3pr5 where I tuned JSONMBS class a bit:

    Dim d As New Dictionary
    dim n as Double = 1/3
    
    d.Value("number") = n // default double handling
    
    // custom formatted number
    Dim j As JSONMBS = JSONMBS.NewNumberNode(Str(n, "-0.0000000000"))
    
    d.value("customNumber") = j
    d.Value( "int64") = 1000000000000000000
    d.Value("uint64") = 8000000000000000000
    
    Dim a() As String
    a.Append "Hello"
    a.Append "World"
    
    d.Value("array") = a
    d.Value("dictionary") = New Dictionary("a":1, "b":2, "c":3)
    
    Dim r As JSONMBS = JSONMBS.Convert(d)
    MsgBox r.toString

or Sign Up to reply!