Serializing a Recordset: JSON Vs Memoryblock

I posted some of this yesterday and was mostly focused on the memoryblock based class i was working on but I wanted to to see how it compared to using JSON both in terms of speed and size of the resulting string… I got thinking about this about a month back when Thom posted about his JSON testing…

I thought a MemoryBlock binary based solution in pure Xojo code might be able to be faster but to know if that was the case, I needed code to produce the JSON

Before yesterday I had never used JSON so I grabbed the RecordSet serialization method from Luna-Express to test against my code using both JSONItem Item and the JSONItem_MTC class, and then wrote a simple new framework method… And my memoryblock class compared reasonably to Xojo.Data.GenerateJSON and blew the others away in my metrics…

But the I realized that the structure of the JSON was naive and not comparable to what my class did… So I looked at the language reference and re-wrote them so they were more similar in content and structure with what my class did… It would require a little more coding on the client, but it should not be much more.

I will post the structured JSONItem And Xojo.Data.Generate code at the end… As I have not gotten to parsing yet (and my Memoryblock solution might be faster there) so I would appreciate it if those who are knowledgeable could take a look at the structured JSON code (new and old frame work) dnd let me know if they see any issues with it.

The test data was from an Sqlite Database view. The recordset had 37 fields - a mixture of strings and numbers with one field NULL in every record… It had 20,000 records to convert… (the DB is from an app I wrote at work to analyze data that helped me get a raise!) There were no binary fields but i want to support them as well… if there are enough of those I suspect the memoryblock class would produce the smallest strings.

So here are the results:

Method StringSize (MB) Time Sec GZipedSize (MB) MemoryBlock 7.18 1.85 1.44 Naive JSON Code JSONItem 14.95 124.67 1.55 JSONItem_MTC 14.95 12.42 1.55 Xojo.Data 14.95 1.67 1.58 Structured JSON Code JSONItem 7.24 28.96 1.33 JSONItem_MTC 7.24 6.30 1.33 Xojo.Data 7.46 0.91 1.40

With the structured JSON, the size advantage of the memoryblock class almost completely disappeared . Speed wise it still beats the JSONtem and JSONitem_MTC classes by a large amount … But now the Xojo.Data.GenerateJSON is twice as fast as the memoryblock Class! I may keep both the memoryblock and a JSON method and let the client app request the format that works best for a given situation.

BTW having to deal with the xojo.framework reminded me why I disliked it so much! :wink:

Hopefully JSONItem’s speed will be improved so it matches that of Xojo.Data.GenerateJSON!

JSONItem and JSONItem_MTC code:

[code]Dim Startms as Double = Microseconds

'Create JSON for the different sections

Dim SetData As New JSONItem_MTC ’ Holds

Dim FieldNameArray As New JSONItem_MTC
Dim FieldTypeArray As New JSONItem_MTC

Dim RecordArray As New JSONItem_MTC

’ Convert field information to JSON and assign to SetData
Dim ubFields as Integer = RS.FieldCount -1, i as Integer

Dim TypeList(), theType as Integer
Redim TypeList(ubFields)

for i = 0 to ubFields
Dim FieldNameObj As New JSONItem_MTC

FieldNameObj.Value(RS.IdxField(i+1).Name) = i
FieldNameArray.Append FieldNameObj

thetype = RS.ColumnType(i)
TypeList(i) = theType
FieldTypeArray.Append thetype
Next

SetData.Value(“FieldNames”) = FieldNameArray
SetData.Value(“FieldTypes”) = FieldTypeArray

FieldNameArray = NIL
FieldtypeArray = NIL

’ Process the records
While Not RS.EOF
Dim theRecord As New JSONItem_MTC

For i = 0 To ubFields
Select Case TypeList(i)
Case 14,15,16 ’ Binary types

  Dim theValue As Variant = RS.IdxField(i+1).Value
  if theValue.IsNull Then
    theRecord.Append theValue
  Else
    theRecord.Append Base64Encode(theValue)
  End if
  
  ' may need special treatment for other data types
  
Else
  theRecord.Append RS.IdxField(i+1).Value
End Select

Next

RecordArray.Append theRecord
RS.MoveNext
Wend

SetData.Value(“Records”) = RecordArray

'Convert to String

Dim StrVal As String = SetData.ToString

'Report time and size information

Dim Duration as Double = Microseconds - Startms
Dim Size As UInt32 =StrVal.LenB
Dim ZipedSz As UInt32 = GZip(StrVal).LenB
Dim ZDuration as Double = Microseconds - Startms
// Close the recordset.
If Close Then
RS.Close
End If

Dim C as New Clipboard
C.text = Str(Size) + Chr(9)+ Str(Duration) + Chr(9) + Str(ZipedSz)+ Chr(9) + Str(ZDuration)
C.Close[/code]

Here is the Xojo.Data.GenerateJSON code:

Dim Startms as Double = Microseconds
Dim SetData As New Xojo.Core.Dictionary

Dim FieldNameDic As New Xojo.Core.Dictionary
Dim TypeList() As Integer

Dim RecordArray() As Auto

' Convert field information to JSON and assign to SetData
Dim ubFields as Integer = RS.FieldCount -1, i as Integer

Redim TypeList(ubFields)


for i = 0 to ubFields
  
  FieldNameDic.Value(RS.IdxField(i+1).Name) = i
  TypeList(i) = RS.ColumnType(i)
Next

SetData.Value("FieldNames") = FieldNameDic
SetData.Value("FieldTypes") = TypeList

FieldNameDic = NIL

While Not RS.EOF
  Dim theRecord() As Auto
  
  For i  = 0 To ubFields
    Select Case TypeList(i)
    Case 14,15,16 ' Binary types
      
      Dim theValue As Variant = RS.IdxField(i+1).Value
      if theValue.IsNull Then
        theRecord.Append NIL
      Else
        theRecord.Append Base64Encode(theValue)
      End if
      
      ' may need special treatment for other data types
      
    Else
      theRecord.Append RS.IdxField(i+1).Value
    End Select
  Next
  RecordArray.Append theRecord
  RS.MoveNext
Wend

SetData.Value("Records") = RecordArray

Dim StrVal As String = Xojo.Data.GenerateJSON(SetData)
Dim Duration as Double = Microseconds - Startms
Dim Size As UInt32 =StrVal.LenB
Dim ZipedSz As UInt32 = GZip(StrVal).LenB

// Close the recordset.
If Close Then
  RS.Close
End If

Dim C as New Clipboard
C.text = Str(Size) + Chr(9)+ Str(Duration) + Chr(9) + Str(ZipedSz)
C.Close

The data in the previous post was obtained running in the IDE on MacOS 10.13.6

I decided to compile for Mac and Win 10 (64 bit aggressive ) to see if it made any difference. The Windows 10 test was run on an actual PC.

Bottom line:

For recordset serialization (the other end is not done yet), when compiled the Memoryblock class is about on par with the structured JSON code using xojo.Data.GenerateJSON, at least for this data set, with these fields, and this huge number of records.

As Thom found, unless speed is not important, stay away from JSONItem!

Without a plugin, if you don’t want to deal with the Xojo.framework, Kem’s JSONItem_MTC is the way to go for JSON Serialization.

Right now the fastest JSON serialization without a plugin is Xojo.Data.GenerateJSON… but I intensely dislike a number of things about that framework!

When I get done with the MemoryBlock method (background project) I may make it available… BTW it can take other inputs besides RecordSets.

Mac Compiled results:

Method StringSize (MB) Time Sec GZipedSize (MB) MemoryBlock 7.18 0.74 1.44 Naive JSON Code JSONItem 14.95 125.43 1.55 JSONItem_MTC 14.95 4.54 1.55 Xojo.Data 14.95 1.54 1.58 Structured JSON Code JSONItem 7.24 28.23 1.33 JSONItem_MTC 7.24 2.11 1.33 Xojo.Data 7.46 0.65 1.40

Win 10 Compiled results:

Method StringSize (MB) Time Sec GZipedSize (MB) MemoryBlock 7.18 0.75 1.44 Naive JSON Code JSONItem 14.95 137.15 1.55 JSONItem_MTC 14.95 4.15 1.55 Xojo.Data 14.95 2.19 1.58 Structured JSON Code JSONItem 7.24 28.06 1.33 JSONItem_MTC 7.24 2.14 1.33 Xojo.Data 7.46 0.84 1.40

Now if you could take the JSON and turn it back into a recordet
There’s an interface but I dont think anything in the framework implements it nor is there any documentation
But that would let you take this data, serialize it, send it to where ever, then deserialize it back into a recordset that actually behaved like a recordset from any db plugin

[quote=443959:@Norman Palardy]Now if you could take the JSON and turn it back into a recordet
There’s an interface but I dont think anything in the framework implements it nor is there any documentation
But that would let you take this data, serialize it, send it to where ever, then deserialize it back into a recordset that actually behaved like a recordset from any db plugin[/quote]

Plan to do that at least for reading data… its pretty straight forwardly doable with JSON or my Memoryblock class under the hood and i can even have any type of cursor movement… i have a rough outline of the design in my head already.

Not sure if I will support modifying the data set yet. That is not so straight forward for multitable selects and views… SQL can do some pretty complicated stuff when creating a recordset and I am not an expert!

BTW Xojo recently deprecated those interfaces… right after you made that feedback request… i think you reminded someone they existed! :wink:

  • karen

yeah I know about the deprecation
bad decision IMHO since they ARE useful if they would just actually adopt them in the various plugins and use them internally

users could write lots of things that behave like databases recordsets etc (like my csv parser which could present a recordset interface)

being able to pickle a recordset and send it to another remote app/process is something you see people doing via json / xml and then having to invent something on the other end so they can deal with what should be a “recordset”

being able to pickle a recordset & unpickle that data into a recordset would be nice …

I hope to essentially have a “standard way” for me of doing that when I am done.

As i said I’m writing the serialization part so that the data can be entered in multiple ways.

I already have it set up with AddFieldDefintions, AddFieldData and SaveRecord methods… so you can add any record oriented data in code for any source

I am also thinking about being able to pass in an Array of Objects who’s public properties (unless marked otherwise with an attribute) get saved as the records… I have not used introspection, but it should be doable that way, but i don’ know about speed with that.

All that would make a useful tool chest i think… ( If only the day job did no get in the way! :wink: )

  • karen

I often use JSON “recordsets” with my JSON-RPC servers and I format them like this:

{
    "field_names": [ ],
    "field_types": [ ],
    "field_sizes": [ ],
    "records": [ [ ] ],
    "record_count": 0,
    "field_count": 0
}

All programming languages I use can either turn JSON directly into a native object or have a good set of JSON commands (I used the Einhugur plugins with Xojo) that make creating a class / module with MoveFirst, MoveNext, EOF, etc… functions for the data very easy.

Have you tried JOSNMBS class in MBS Xojo Util Plugin?
We recently got methods to convert between dictionary/array to JSON data.
It may be quicker than built-in classes.

[quote=443997:@Steve Wilson]I often use JSON “recordsets” with my JSON-RPC servers and I format them like this:

{
    "field_names": [ ],
    "field_types": [ ],
    "field_sizes": [ ],
    "records": [ [ ] ],
    "record_count": 0,
    "field_count": 0
}

[/quote]

What field_Types do you use? The ones reported by the RecordSet for the fields?

Why include field_sizes? is it to support editing fixed size Database text types (analogous to strings in Xojo structures)? If so how do you get them? (i am trying to code only using info that is in the Recordset supplied by Xojo)

The datatypes the RecordSet class reports does not include that information (it would be nice if i did!). They are more generic and don’t include max text lengths as defined in the DB Structure for fields.

BTW do you just assume text encoding are always UTF8? Not all DBs use that and the Xojo plugins don’t deal with different encoding AFAIK.

The old or new Einhugur JSON plugin? (I have an Einhugur license - maybe I will try them)

For this project (as i was considering releasing some of the classes when done) I wanted to avoid plugins AND the Xojo.Framework… and the built in JSONItem Class is dog slow!!!

I can see how using something like JSONItem under the hood would make it a lot easier to make the dataset editable on the client than my MemoryBlock solution… it kind of comes built-in.

  • Karen

There was talk at XDC of replacing it with my code under the hood. We’ll see if anything comes of that.

Kem,

JSON speed (and RAM use) tends to be very important in a lot of applications - and particularly so for an Xojo coded server.

While your class is a HUGE improvement over the existing JSONItem, it is still slow compared to Xojo.Data.GenerateJSON.

I suspect that it may not be written in pure Xojo Code under the hood. It is about the same speed as my binary Memoryblock class but does a lot more.

While dog fooding in general is important and good for a lot of things, it needs to be recognized that Xojo as a language has its strengths and weaknesses inherent in it’s underlying design.

I think the weaknesses for some things comes from overhead for safety and/or immutable strings as well as other things I am not aware of.

When under the hood features are being designed, they should be designed considering if the language weakneses are significant for the feature.

If so the feature should implemented is a way that helps overcome the weaknesses (speed and/or RAM use) even if that means NOT coding them in Xojo (or adding something to the Xojo language that helps overcome it -and NOT ONLY for internal use!)…

That philosophy could make it practical to use Xojo in more situations.

I think JSON falls into that category. They might be better off using fast open source C code in an internal plugin for example.

BTW in terms of adding something to Xojo that helps overcome some of the weaknesses, I added a feature request to allow “Freezing” a Memoryblock (making it immutable after we “fill” it) so that it could be passed to things like sockets and other framework methods without being copied under the hood to a string …That could help boost performance for a lot of situations IMO.

OK, enough ranting for now! :wink:

  • karen

Xojo.Data is likely written in C++

That makes sense as it does not get a big boost when compiled aggressively vs in the debugger.

  • karen

That’s database specific - I use Pypy for the servers so the field data will be whatever’s available from cursor.description. When I’ve used SQLite I’ve had to resort to “PRAGMA table_info()” for this though because cursor.description doesn’t return anything useful.

It can be useful (where available) for things such as sizing column widths in a listbox on the client side.

I always use UTF8 for everything.

I guess it’s the old one as I haven’t used Xojo for any recent projects.

Karen, I agree with all your points, but I think we agree that it would still be a big improvement without too much effort. Rewriting in C would be better, of course, but would also demand more time, and so may never happen.

I 100% agree that your code is way better than the JSONItem we have now . Though as you generously opened sourced it, we do (or can) already have and use your code …

it would be better if they really optimized it rather using a bandaid and then forgetting about it again… so much time wasted on the Xojo.framework

  • Karen

Thanks Steve

I did not realize you were using Xojo only on the client.

  • Karen

I added the old and new Einhugur JSON plugins to my recordset serialization testing (Did not try MBS as I don’t have a current license, and this project does not warrant buying one)

Here are the results from compiled apps:

Time (sec) Size (MB) Mac Win Initial Ziped Memory block 0.73 0.51 7.18 1.44 JSONItem 26.95 21.90 7.24 1.33 JSONItem_MTC 2.10 1.69 7.24 1.33 Xojo.Data 0.63 0.66 7.46 1.40 Einhugur Old 0.82 0.74 6.42 1.24 Einhugur New 4.85 3.89 16.61 1.56

The old Einhugur plugin performs very well but has some issues… apparently it does not have unicode support, and I could not find a way of setting an item to null - which is important for dealing with recordsets.

The new one, beside having unicode support, is a lot nicer to code with… right now though it looks like there is a bug which prevents it from putting out compact JSON strings - which explains the large initial string size… but it looks like it is slower than Kem’s Xojo code solution!

My Memoryblock based class performs well … but outputs binary strings in a non standard format. On the client side it would not require any “deserialization” per say… Accessing non string fields come down to just doing a BinaryStream read (strings need justa little more processing ) directly against the string sent, so very light weight on the client.

But that simplicity comes at the price of flexibility for things other than simply reading the values. i think there are use cases where it would be best (particularly if is there is a lot of BLOB type data and or the client does not have a lot of resources), but JSON would be more generally useful.

As much as I dislike the Xojo.framework, from these options, for this use, it looks like that would be best (BTW it looks like TEXT autoconverts to string on assignment now … I did not think that used to be the case!)… but as it will be encapsulated in a class and output a string, once I finish the class i won’t have to deal with the Xojo.framework at all.

So my serialization class will have the option of either outputting JSON (via Xojo.Data under the hood) or a binary string from my memoryblock based class .

The deserialization class (when done) will be able to handle either… and the JSON part will be very easy to write.

  • Karen

Note that you can try in the IDE without a license, so you could at least comparative speeds tests within the IDE for the various methods (which is what you started doing before compiled tests).

And compiled, I am not sure you would see more than a nag screen – but I can’t say for sure, as I have a license…

Alternatively, if you can provide a download link to a zip with the source project and resources like database used, I’m sure someone would be happy to compile and report the speeds.

[quote=444284:@Douglas Handy]Note that you can try in the IDE without a license, so you could at least comparative speeds tests within the IDE for the various methods (which is what you started doing before compiled tests).
[/quote]

I knew that, but once I saw how the relative values changed when compiled, I realized that comparing compiled performance is all that real mattered.

In terms of others doing the testing, the database I am using contains proprietary information, so I could not share it. That makes an apple to apples comparison not be possible… (BTW there is a LOT more data in it than I am am using for this testing despite using 20K records!)

Using MBS might be a good option for serialization but likely not for deserialization… A while back Thom posted on his blog about the various JSON options available for Xojo. On the serialization side for the ways we both tested, although his use case is very different from mine, my results pretty much parallel his.

I have not gotten to deserialization yet so I have no data to compare to his, but initially Thom was very impressed by the perforce MBS option, but when he tried to use it in production , he found the benchmark he was using did not tell the whole story:

https://thezaz.com/blog/the_json_saga_continues

[code]I had decided that the JSONMBS class was worth the hassle given just how fast it is. I wrote a nice GenerateJSON function that could turn a whole bunch of stuff into a JSON string, including a class interface so any object could be become serializable.

The problem is in preparing the JSONMBS objects. All performance gained by using the MBS classes is completely lost by doing so. With the 10MB file, actually producing the content from scratch takes just as much time as letting Kem Tekinay’s M_JSON module do it. The reverse is true for parsing too. Converting JSONMBS back into normal Xojo objects is about 4x slower than Xojo.Data, and about on-par with M_JSON.[/code]

Given that. and also having no way to really test it, I decided not to include it.

  • karen