Serializing a Recordset: JSON Vs Memoryblock

  1. 3 months ago

    Karen A

    Jul 3 Pre-Release Testers
    Edited 3 months ago

    I posted some of this yesterday and was mostly focused on the memoryblock based class i was working on but I wanted to to see how it compared to using JSON both in terms of speed and size of the resulting string... I got thinking about this about a month back when Thom posted about his JSON testing...

    I thought a MemoryBlock binary based solution in pure Xojo code might be able to be faster but to know if that was the case, I needed code to produce the JSON

    Before yesterday I had never used JSON so I grabbed the RecordSet serialization method from Luna-Express to test against my code using both JSONItem Item and the JSONItem_MTC class, and then wrote a simple new framework method... And my memoryblock class compared reasonably to Xojo.Data.GenerateJSON and blew the others away in my metrics...

    But the I realized that the structure of the JSON was naive and not comparable to what my class did... So I looked at the language reference and re-wrote them so they were more similar in content and structure with what my class did... It would require a little more coding on the client, but it should not be much more.

    I will post the structured JSONItem And Xojo.Data.Generate code at the end... As I have not gotten to parsing yet (and my Memoryblock solution might be faster there) so I would appreciate it if those who are knowledgeable could take a look at the structured JSON code (new and old frame work) dnd let me know if they see any issues with it.

    The test data was from an Sqlite Database view. The recordset had 37 fields - a mixture of strings and numbers with one field NULL in every record... It had 20,000 records to convert... (the DB is from an app I wrote at work to analyze data that helped me get a raise!) There were no binary fields but i want to support them as well.. if there are enough of those I suspect the memoryblock class would produce the smallest strings.

    So here are the results:

        Method                 StringSize (MB)        Time Sec         GZipedSize (MB)
    MemoryBlock                    7.18                 1.85                1.44
                                                    Naive JSON Code
    JSONItem                      14.95               124.67                 1.55
    JSONItem_MTC                  14.95                12.42                 1.55
    Xojo.Data                     14.95                 1.67                 1.58
                                                Structured JSON Code
    JSONItem                       7.24                28.96                 1.33
    JSONItem_MTC                   7.24                 6.30                 1.33
    Xojo.Data                      7.46                 0.91                 1.40

    With the structured JSON, the size advantage of the memoryblock class almost completely disappeared . Speed wise it still beats the JSONtem and JSONitem_MTC classes by a large amount ... But now the Xojo.Data.GenerateJSON is twice as fast as the memoryblock Class! I may keep both the memoryblock and a JSON method and let the client app request the format that works best for a given situation.

    BTW having to deal with the xojo.framework reminded me why I disliked it so much! ;)

    Hopefully JSONItem's speed will be improved so it matches that of Xojo.Data.GenerateJSON!

    JSONItem and JSONItem_MTC code:

    Dim Startms as Double = Microseconds
    
    'Create JSON for the different sections
    
    Dim SetData As New JSONItem_MTC ' Holds
    
    Dim FieldNameArray As New JSONItem_MTC
    Dim FieldTypeArray As New JSONItem_MTC
    
    Dim RecordArray As New JSONItem_MTC
    
    ' Convert field information to JSON and assign to SetData
    Dim ubFields as Integer = RS.FieldCount -1, i as Integer
    
    Dim TypeList(), theType as Integer
    Redim TypeList(ubFields)
    
    for i = 0 to ubFields
      Dim FieldNameObj As New JSONItem_MTC
      
      FieldNameObj.Value(RS.IdxField(i+1).Name) = i
      FieldNameArray.Append FieldNameObj
      
      thetype = RS.ColumnType(i)
      TypeList(i) = theType
      FieldTypeArray.Append thetype
    Next
    
    SetData.Value("FieldNames") = FieldNameArray
    SetData.Value("FieldTypes") = FieldTypeArray
    
    FieldNameArray = NIL
    FieldtypeArray = NIL
    
    ' Process the records
    While Not RS.EOF
      Dim theRecord As New JSONItem_MTC
      
      For i  = 0 To ubFields
        Select Case TypeList(i)
        Case 14,15,16 ' Binary types
          
          Dim theValue As Variant = RS.IdxField(i+1).Value
          if theValue.IsNull Then
            theRecord.Append theValue
          Else
            theRecord.Append Base64Encode(theValue)
          End if
          
          ' may need special treatment for other data types
          
        Else
          theRecord.Append RS.IdxField(i+1).Value
        End Select
      Next
      
      RecordArray.Append theRecord
      RS.MoveNext
    Wend
    
    SetData.Value("Records") = RecordArray
    
    'Convert to String
    
    Dim StrVal As String = SetData.ToString
    
    'Report time and size information
    
    Dim Duration as Double = Microseconds - Startms
    Dim Size As UInt32 =StrVal.LenB
    Dim ZipedSz As UInt32 = GZip(StrVal).LenB
    Dim ZDuration as Double = Microseconds - Startms
    // Close the recordset.
    If Close Then
      RS.Close
    End If
    
    Dim C as New Clipboard
    C.text = Str(Size) + Chr(9)+ Str(Duration) + Chr(9) + Str(ZipedSz)+ Chr(9) + Str(ZDuration)
    C.Close

    Here is the Xojo.Data.GenerateJSON code:

    Dim Startms as Double = Microseconds
    Dim SetData As New Xojo.Core.Dictionary
    
    Dim FieldNameDic As New Xojo.Core.Dictionary
    Dim TypeList() As Integer
    
    Dim RecordArray() As Auto
    
    ' Convert field information to JSON and assign to SetData
    Dim ubFields as Integer = RS.FieldCount -1, i as Integer
    
    Redim TypeList(ubFields)
    
    
    for i = 0 to ubFields
      
      FieldNameDic.Value(RS.IdxField(i+1).Name) = i
      TypeList(i) = RS.ColumnType(i)
    Next
    
    SetData.Value("FieldNames") = FieldNameDic
    SetData.Value("FieldTypes") = TypeList
    
    FieldNameDic = NIL
    
    While Not RS.EOF
      Dim theRecord() As Auto
      
      For i  = 0 To ubFields
        Select Case TypeList(i)
        Case 14,15,16 ' Binary types
          
          Dim theValue As Variant = RS.IdxField(i+1).Value
          if theValue.IsNull Then
            theRecord.Append NIL
          Else
            theRecord.Append Base64Encode(theValue)
          End if
          
          ' may need special treatment for other data types
          
        Else
          theRecord.Append RS.IdxField(i+1).Value
        End Select
      Next
      RecordArray.Append theRecord
      RS.MoveNext
    Wend
    
    SetData.Value("Records") = RecordArray
    
    Dim StrVal As String = Xojo.Data.GenerateJSON(SetData)
    Dim Duration as Double = Microseconds - Startms
    Dim Size As UInt32 =StrVal.LenB
    Dim ZipedSz As UInt32 = GZip(StrVal).LenB
    
    // Close the recordset.
    If Close Then
      RS.Close
    End If
    
    Dim C as New Clipboard
    C.text = Str(Size) + Chr(9)+ Str(Duration) + Chr(9) + Str(ZipedSz)
    C.Close
     
  2. Karen A

    Jul 3 Pre-Release Testers
    Edited 3 months ago

    The data in the previous post was obtained running in the IDE on MacOS 10.13.6

    I decided to compile for Mac and Win 10 (64 bit aggressive ) to see if it made any difference. The Windows 10 test was run on an actual PC.

    Bottom line:

    For recordset serialization (the other end is not done yet), when compiled the Memoryblock class is about on par with the structured JSON code using xojo.Data.GenerateJSON, at least for this data set, with these fields, and this huge number of records.

    As Thom found, unless speed is not important, stay away from JSONItem!

    Without a plugin, if you don't want to deal with the Xojo.framework, Kem's JSONItem_MTC is the way to go for JSON Serialization.

    Right now the fastest JSON serialization without a plugin is Xojo.Data.GenerateJSON... but I intensely dislike a number of things about that framework!

    When I get done with the MemoryBlock method (background project) I may make it available... BTW it can take other inputs besides RecordSets.

    Mac Compiled results:

        Method                 StringSize (MB)        Time Sec         GZipedSize (MB)
    MemoryBlock                    7.18                 0.74                 1.44
                                                    Naive JSON Code
    JSONItem                      14.95               125.43                 1.55
    JSONItem_MTC                  14.95                 4.54                 1.55
    Xojo.Data                     14.95                 1.54                 1.58
                                                Structured JSON Code
    JSONItem                       7.24                28.23                 1.33
    JSONItem_MTC                   7.24                 2.11                 1.33
    Xojo.Data                      7.46                 0.65                 1.40

    Win 10 Compiled results:

        Method                 StringSize (MB)        Time Sec         GZipedSize (MB)
    MemoryBlock                    7.18                0.75                  1.44
                                                    Naive JSON Code
    JSONItem                      14.95              137.15                  1.55
    JSONItem_MTC                  14.95                4.15                  1.55
    Xojo.Data                     14.95                2.19                  1.58
                                                Structured JSON Code
    JSONItem                       7.24               28.06                  1.33
    JSONItem_MTC                   7.24                2.14                  1.33
    Xojo.Data                      7.46                0.84                  1.40
  3. Norman P

    Jul 3 Pre-Release Testers, Xojo Pro great-white-software.com/blog

    Now if you could take the JSON and turn it back into a recordet
    There's an interface but I dont think anything in the framework implements it nor is there any documentation
    But that would let you take this data, serialize it, send it to where ever, then deserialize it back into a recordset that actually behaved like a recordset from any db plugin

  4. Karen A

    Jul 3 Pre-Release Testers
    Edited 3 months ago

    @Norman P Now if you could take the JSON and turn it back into a recordet
    There's an interface but I dont think anything in the framework implements it nor is there any documentation
    But that would let you take this data, serialize it, send it to where ever, then deserialize it back into a recordset that actually behaved like a recordset from any db plugin

    Plan to do that at least for reading data... its pretty straight forwardly doable with JSON or my Memoryblock class under the hood and i can even have any type of cursor movement... i have a rough outline of the design in my head already.

    Not sure if I will support modifying the data set yet. That is not so straight forward for multitable selects and views... SQL can do some pretty complicated stuff when creating a recordset and I am not an expert!

    BTW Xojo recently deprecated those interfaces... right after you made that feedback request... i think you reminded someone they existed! ;)

    - karen

  5. Norman P

    Jul 3 Pre-Release Testers, Xojo Pro great-white-software.com/blog

    yeah I know about the deprecation
    bad decision IMHO since they ARE useful if they would just actually adopt them in the various plugins and use them internally

    users could write lots of things that behave like databases recordsets etc (like my csv parser which could present a recordset interface)

    being able to pickle a recordset and send it to another remote app/process is something you see people doing via json / xml and then having to invent something on the other end so they can deal with what should be a "recordset"

    being able to pickle a recordset & unpickle that data into a recordset would be nice ......

  6. Karen A

    Jul 3 Pre-Release Testers

    @Norman P users could write lots of things that behave like databases recordsets etc (like my csv parser which could present a recordset interface)

    I hope to essentially have a "standard way" for me of doing that when I am done.

    As i said I'm writing the serialization part so that the data can be entered in multiple ways.

    I already have it set up with AddFieldDefintions, AddFieldData and SaveRecord methods... so you can add any record oriented data in code for any source

    I am also thinking about being able to pass in an Array of Objects who's public properties (unless marked otherwise with an attribute) get saved as the records... I have not used introspection, but it should be doable that way, but i don' know about speed with that.

    All that would make a useful tool chest i think... ( If only the day job did no get in the way! ;) )

    - karen

  7. I often use JSON "recordsets" with my JSON-RPC servers and I format them like this:

    {
        "field_names": [ ],
        "field_types": [ ],
        "field_sizes": [ ],
        "records": [ [ ] ],
        "record_count": 0,
        "field_count": 0
    }

    All programming languages I use can either turn JSON directly into a native object or have a good set of JSON commands (I used the Einhugur plugins with Xojo) that make creating a class / module with MoveFirst, MoveNext, EOF, etc... functions for the data very easy.

  8. Christian S

    Jul 4 Pre-Release Testers, Xojo Pro, XDC Speakers, Third Party Store Germany

    Have you tried JOSNMBS class in MBS Xojo Util Plugin ?
    We recently got methods to convert between dictionary/array to JSON data.
    It may be quicker than built-in classes.

  9. Karen A

    Jul 4 Pre-Release Testers
    Edited 3 months ago

    @Steve W I often use JSON "recordsets" with my JSON-RPC servers and I format them like this:

    {
        "field_names": [ ],
        "field_types": [ ],
        "field_sizes": [ ],
        "records": [ [ ] ],
        "record_count": 0,
        "field_count": 0
    }

    What field_Types do you use? The ones reported by the RecordSet for the fields?

    Why include field_sizes? is it to support editing fixed size Database text types (analogous to strings in Xojo structures)? If so how do you get them? (i am trying to code only using info that is in the Recordset supplied by Xojo)

    The datatypes the RecordSet class reports does not include that information (it would be nice if i did!). They are more generic and don't include max text lengths as defined in the DB Structure for fields.

    BTW do you just assume text encoding are always UTF8? Not all DBs use that and the Xojo plugins don't deal with different encoding AFAIK.

    All programming languages I use can either turn JSON directly into a native object or have a good set of JSON commands (I used the Einhugur plugins with Xojo) that make creating a class / module with MoveFirst, MoveNext, EOF, etc... functions for the data very easy.

    The old or new Einhugur JSON plugin? (I have an Einhugur license - maybe I will try them)

    For this project (as i was considering releasing some of the classes when done) I wanted to avoid plugins AND the Xojo.Framework... and the built in JSONItem Class is dog slow!!!

    I can see how using something like JSONItem under the hood would make it a lot easier to make the dataset editable on the client than my MemoryBlock solution... it kind of comes built-in.

    - Karen

  10. Kem T

    Jul 4 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut

    There was talk at XDC of replacing it with my code under the hood. We'll see if anything comes of that.

  11. Karen A

    Jul 4 Pre-Release Testers
    Edited 3 months ago

    Kem,

    JSON speed (and RAM use) tends to be very important in a lot of applications - and particularly so for an Xojo coded server.

    While your class is a HUGE improvement over the existing JSONItem, it is still slow compared to Xojo.Data.GenerateJSON.

    I suspect that it may not be written in pure Xojo Code under the hood. It is about the same speed as my binary Memoryblock class but does a lot more.

    While dog fooding in general is important and good for a lot of things, it needs to be recognized that Xojo as a language has its strengths and weaknesses inherent in it's underlying design.

    I think the weaknesses for some things comes from overhead for safety and/or immutable strings as well as other things I am not aware of.

    When under the hood features are being designed, they should be designed considering if the language weakneses are significant for the feature.

    If so the feature should implemented is a way that helps overcome the weaknesses (speed and/or RAM use) even if that means NOT coding them in Xojo (or adding something to the Xojo language that helps overcome it -and NOT ONLY for internal use!)...

    That philosophy could make it practical to use Xojo in more situations.

    I think JSON falls into that category. They might be better off using fast open source C code in an internal plugin for example.

    BTW in terms of adding something to Xojo that helps overcome some of the weaknesses, I added a feature request to allow "Freezing" a Memoryblock (making it immutable after we "fill" it) so that it could be passed to things like sockets and other framework methods without being copied under the hood to a string ...That could help boost performance for a lot of situations IMO.

    OK, enough ranting for now! ;)
    - karen

  12. Norman P

    Jul 4 Pre-Release Testers, Xojo Pro great-white-software.com/blog

    Xojo.Data is likely written in C++

  13. Karen A

    Jul 4 Pre-Release Testers

    @Norman P Xojo.Data is likely written in C++

    That makes sense as it does not get a big boost when compiled aggressively vs in the debugger.

    - karen

  14. @Karen A What field_Types do you use? The ones reported by the RecordSet for the fields?

    That's database specific - I use Pypy for the servers so the field data will be whatever's available from cursor.description. When I've used SQLite I've had to resort to "PRAGMA table_info()" for this though because cursor.description doesn't return anything useful.

    @Karen Atkocius Why include field_sizes?

    It can be useful (where available) for things such as sizing column widths in a listbox on the client side.

    @Karen Atkocius BTW do you just assume text encoding are always UTF8?

    I always use UTF8 for everything.

    @Karen Atkocius The old or new Einhugur JSON plugin? (I have an Einhugur license - maybe I will try them)

    I guess it's the old one as I haven't used Xojo for any recent projects.

  15. Kem T

    Jul 4 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut

    Karen, I agree with all your points, but I think we agree that it would still be a big improvement without too much effort. Rewriting in C would be better, of course, but would also demand more time, and so may never happen.

  16. Karen A

    Jul 4 Pre-Release Testers
    Edited 3 months ago

    @Kem T Karen, I agree with all your points, but I think we agree that it would still be a big improvement without too much effort. Rewriting in C would be better, of course, but would also demand more time, and so may never happen.

    I 100% agree that your code is way better than the JSONItem we have now . Though as you generously opened sourced it, we do (or can) already have and use your code ....

    it would be better if they really optimized it rather using a bandaid and then forgetting about it again... so much time wasted on the Xojo.framework <sigh>

    - Karen

  17. Karen A

    Jul 4 Pre-Release Testers
    Edited 3 months ago

    @Steve W That's database specific - I use Pypy for the server

    Thanks Steve

    I did not realize you were using Xojo only on the client.

    - Karen

  18. Karen A

    Jul 4 Pre-Release Testers
    Edited 3 months ago

    I added the old and new Einhugur JSON plugins to my recordset serialization testing (Did not try MBS as I don't have a current license, and this project does not warrant buying one)

    Here are the results from compiled apps:

    	          Time (sec)		 Size (MB)	
    	        Mac	Win	       Initial	Ziped
    Memory block	0.73	0.51		 7.18	1.44
    JSONItem       26.95   21.90		 7.24	1.33
    JSONItem_MTC	2.10	1.69		 7.24	1.33
    Xojo.Data	0.63	0.66		 7.46	1.40
    Einhugur Old	0.82	0.74		 6.42	1.24
    Einhugur New	4.85	3.89		16.61	1.56

    The old Einhugur plugin performs very well but has some issues... apparently it does not have unicode support, and I could not find a way of setting an item to null - which is important for dealing with recordsets.

    The new one, beside having unicode support, is a lot nicer to code with... right now though it looks like there is a bug which prevents it from putting out compact JSON strings - which explains the large initial string size.... but it looks like it is slower than Kem's Xojo code solution!

    My Memoryblock based class performs well ... but outputs binary strings in a non standard format. On the client side it would not require any "deserialization" per say... Accessing non string fields come down to just doing a BinaryStream read (strings need justa little more processing ) directly against the string sent, so very light weight on the client.

    But that simplicity comes at the price of flexibility for things other than simply reading the values. i think there are use cases where it would be best (particularly if is there is a lot of BLOB type data and or the client does not have a lot of resources), but JSON would be more generally useful.

    As much as I dislike the Xojo.framework, from these options, for this use, it looks like that would be best (BTW it looks like TEXT autoconverts to string on assignment now ... I did not think that used to be the case!)... but as it will be encapsulated in a class and output a string, once I finish the class i won't have to deal with the Xojo.framework at all.

    So my serialization class will have the option of either outputting JSON (via Xojo.Data under the hood) or a binary string from my memoryblock based class .

    The deserialization class (when done) will be able to handle either.... and the JSON part will be very easy to write.

    - Karen

  19. Douglas H

    Jul 5 Pre-Release Testers, Xojo Pro

    @Karen A I added the old and new Einhugur JSON plugins to my recordset serialization testing (Did not try MBS as I don't have a current license, and this project does not warrant buying one)

    Note that you can try in the IDE without a license, so you could at least comparative speeds tests within the IDE for the various methods (which is what you started doing before compiled tests).

    And compiled, I am not sure you would see more than a nag screen -- but I can't say for sure, as I have a license...

    Alternatively, if you can provide a download link to a zip with the source project and resources like database used, I'm sure someone would be happy to compile and report the speeds.

  20. Karen A

    Jul 5 Pre-Release Testers
    Edited 3 months ago

    @Douglas H Note that you can try in the IDE without a license, so you could at least comparative speeds tests within the IDE for the various methods (which is what you started doing before compiled tests).

    I knew that, but once I saw how the relative values changed when compiled, I realized that comparing compiled performance is all that real mattered.

    In terms of others doing the testing, the database I am using contains proprietary information, so I could not share it. That makes an apple to apples comparison not be possible... (BTW there is a LOT more data in it than I am am using for this testing despite using 20K records!)

    Using MBS might be a good option for serialization but likely not for deserialization... A while back Thom posted on his blog about the various JSON options available for Xojo. On the serialization side for the ways we both tested, although his use case is very different from mine, my results pretty much parallel his.

    I have not gotten to deserialization yet so I have no data to compare to his, but initially Thom was very impressed by the perforce MBS option, but when he tried to use it in production , he found the benchmark he was using did not tell the whole story:

    https://thezaz.com/blog/the_json_saga_continues

    I had decided that the JSONMBS class was worth the hassle given just how fast it is. I wrote a nice GenerateJSON function that could turn a whole bunch of stuff into a JSON string, including a class interface so any object could be become serializable.
    
    The problem is in preparing the JSONMBS objects. All performance gained by using the MBS classes is completely lost by doing so. With the 10MB file, actually producing the content from scratch takes just as much time as letting Kem Tekinay's M_JSON module do it. The reverse is true for parsing too. Converting JSONMBS back into normal Xojo objects is about 4x slower than Xojo.Data, and about on-par with M_JSON.

    Given that. and also having no way to really test it, I decided not to include it.

    - karen

  21. Newer ›

or Sign Up to reply!