Structs versus collection

Jeff_Tullin · June 29, 2018, 7:16pm

I started with a class.
it has about 6 properties of which 4 are float… single is good enough.

Some of my customers have reached the point where they have 100,000 objects of this class.
These need to be serialised and cloned while the app runs.
Its WAY too slow.

Im considering refactoring them as an ‘array’ of structs in a memoryblock
That way I can address each by index, and clone the whole array by copying the memoryblock

But I seem to recall that structs act strangely these days… byte boundaries??

Does anyone have any advice/caveats before I go wholesale refactoring?

Kem_Tekinay · June 29, 2018, 7:34pm

I just did a quick test and cloning an object with 6 properties via Introspection 100k times and adding those to an array takes 2 seconds here. (1,919,399 microseconds, to be precise.) Is that what you meant by “too slow”, or is something else going on?

Just to show my work…

// Class1
Public Sub Constructor(copyFrom As Class1)
  Constructor
  
  dim ti as Introspection.TypeInfo = GetTypeInfo( Class1 ) 
  
  for each prop as Introspection.PropertyInfo in ti.GetProperties
    prop.Value( self ) = prop.Value( copyFrom )
  next
End Sub

// Timing code
dim master as new Class1
dim copies() as Class1

for i as integer = 1 to 100000
  copies.Append new Class1( master )
next

Jeff_Tullin · June 29, 2018, 8:05pm

Interesting.
WAY slower than that.
Maybe something else going on, but to be honest , as you know, even 2 seconds gets some people annoyed.
I’ll do some digging (and try that code!.. thanks…)

Bob_Keeney · June 29, 2018, 8:32pm

You know, sometimes you ‘solve’ the problem by throwing up a progress bar and doing it ‘slower’ but show the user that something is going on. As long as the app doesn’t ‘freeze’ or shows the spinning beachball of death all is good with the user.

DaveS · June 29, 2018, 9:20pm

And it is amazing how much “speed” is psychological.
Many years ago, a friend of mine was lamenting the fact that his boss wanted him to speed up a particular program he had written, and there really wasn’t any way to do it. So I asked him, “so what does it do?”. “Well, sadly its ‘Top Secret’, so I can’t tell you” [and yes it was TS, as he worked for a DoD contractor, as did I, just a different one.]. So I said… "Without specifics, what does the user do/see "

Menu screen appears
User Selects option
screen clears
process runs
user is presented with results

Note : this was in the 80’s (so it was all terminal output)

I told him… I bet you can “speed” it up by making a one line change

Menu screen appears
User Selects option
process runs
screen clears
user is presented with results

His Boss was estatic… the customer was happy, my friend actually got a bonus, and I got “thanks”

But it was all in the mind of the user… it still took the same amount of time… it is just that they weren’t staring at a blank screen, instead the menu remained until the “process” completed…

Neil_Burkholder · June 29, 2018, 10:41pm

I get claustrophobic when I can’t see how a process is coming along. It gives me the urge to pull the plug or press the reset button.

Sam_Rowlands · June 30, 2018, 3:36am

[quote=394140:@Jeff Tullin]But I seem to recall that structs act strangely these days… byte boundaries??

Does anyone have any advice/caveats before I go wholesale refactoring?[/quote]
Xojo advise against using Structures as a data type in your application, they suggest you only use them to work with API.

Personally I would look at your serialization code.

Jeff_Tullin · June 30, 2018, 6:04am

Turns out one of the major slowdown is adding a new instance of the class to a collection of the class.

this is what and why… has anyone a suggestion for a better method?

The class represents a line, with end points represented as doubles.
The line has other attributes , which are integers

I have a collection of these objects, and the key for each is derived from the position

Eg key is k = "B"+format(startx,"#.#") +format(starty,"#.#")+format(endx,"#.#")+format(endy,"#.#")

If a new one is created, I can ‘quickly’ find out if there is another line at the same position by looking in the collection of an item using that key.

try existingitem = mycollection.item(k) catch end try

If it exists, I can just change the attributes of it, or I can delete the existing one and create a new.

This works well for small numbers.
But when the collection is greater than a few thousand, it is this search that makes the slowdown occur.

Removing it gets me back to a reasonable speed, (one ‘small number’ test went from 95 ticks down to 2 ticks with no other changes)
but I might easily end up with two lines at the same placement, which for my purposes is a bad thing.

How best to avoid this?

Jeff_Tullin · June 30, 2018, 7:32am

[quote]Notes
The Dictionary class provides all the functionality of the Collection class and offers several advantages: With the Collection class, the time taken to locate an item is a function of the number of items in the Collection because the search is sequential. A Dictionary uses a hash table, making the time (relatively) independent of the number of items. It is designed for high-speed lookups.[/quote]

Dictionaries.
I see over the last 5 years I post now and then about why a dictionary isnt a drop in replacement for a collection, and shying away.
Dictionaries may be the solution. Major refactoring but in a different way.
Cant iterate using ‘normal’ Xojo code.
Instead I need to iterate over the keys and get the values by key

The ‘new framework’ dictionary allows iteration over the values, but Im not going there…

James_Dooley · June 30, 2018, 10:29am

Id start with an in memory SQLite database. It is likely to be better optimized than anything you will roll yourself.

Kem_Tekinay · June 30, 2018, 10:49am

I didn’t think anyone used Collections anymore.

James’ suggestion is a good one as you can index the points and query against them directly but you would still need some other structure like a Dictionary to hold the objects themselves. Your SQLite db could generate a new SN for each for record that you could use in a Dictionary to store your objects. You’d have to keep them in sync if you delete a point.

Using a Dictionary is going to dramatically speed up your code as-is. Note that the fastest way to iterate is to grab all the keys and values in one shot into arrays like this:

dim keys() as variant = d.Keys
dim values() as variant = d.Values
for index as integer = 0 to keys.Ubound
  dim key as variant = keys( index )
  dim value as variant = values( index )
  // do stuff
next

Dictionary also has HasKey and Lookup which are both faster than catching an exception.

If there is a still a performance issue, converting your class to a string value is slower than creating some form of hash on the values and using that as a key. In your case, I’m not sure how you would create a unique hash from the four points. I guess it would depend on what the max and min of those points could be.

Another idea: create a tree using Dictionaries. The top level key is startx, the next level is endx, the next is starty, then endy to finally give you the class. You could code around this to keep your refactoring at a minimum.

Kem_Tekinay · June 30, 2018, 10:58am

BTW, your key assumes no negative values, is that right?

Kem_Tekinay · June 30, 2018, 11:18am

I just did another test. Using a string key on a Dictionary to add 100,000 objects without duplicates took a little over a second. If I use a MemoryBlock to create a hash, it takes around 250-300 ms.

Public Shared Function ToString(startX As Double, endX As Double, startY As Double, endY As Double) as String
  'const kSep as string = ","
  'const kFormat as string = "-#.0"
  '
  'return _
  'format( startX, kFormat ) + kSep + _
  'format( endX, kFormat ) + kSep + _
  'format( startY, kFormat ) + kSep + _
  'format( endY, kFormat )
  
  dim mb as new MemoryBlock( 16 )
  mb.Int32Value( 0 ) = startX * 10.0
  mb.Int32Value( 4 ) = endX * 10.0
  mb.Int32Value( 8 ) = startY * 10.0
  mb.Int32Value( 12 ) = endY * 10.0
  
  return mb
End Function

There might be some additional overhead to make sure that endX/Y > startX/Y unless you’ve already ensured that elsewhere.

Kem_Tekinay · June 30, 2018, 11:21am

My testing code:

#if not DebugBuild
  #pragma BackgroundTasks False
  #pragma NilObjectChecking False
  #pragma StackOverflowChecking False
  #pragma BoundsChecking False
#endif

dim r as new Random
dim d as new Dictionary

dim msg as string
dim sw as new Stopwatch_MTC

for i as integer = 1 to 100000
  dim startX as double = r.InRange( -100, 100 ) / 10.0
  dim endX as double = r.InRange( 50, 70 ) / 10.0 + startX
  dim startY as double = r.InRange( -100, 100 ) / 10.0
  dim endY as double = r.InRange( 50, 70 ) / 10.0 + startY
  
  sw.Start
  dim key as string = LineClass.ToString( startX, endX, startY, endY )
  dim l as LineClass = d.Lookup( key, nil )
  if l is nil then
    l = new LineClass
    l.StartX = startX
    l.EndX = endX
    l.StartY = startY
    l.EndY = endY
    d.Value( key ) = l
  end if
  sw.Stop
next
msg = format( sw.ElapsedMicroseconds, "#," ) + " microsecs"
AddToResult msg
AddToResult "Objects in Dict: " + str( d.Count )

Jeff_Tullin · June 30, 2018, 1:26pm

Legacy code.

I’m currently working my way through the code.
It will take a day or so to refactor enough to be able to run it, which is why I have steered away from doing this for so long.
Its often only when you come to do this kind of thing that you see how many places use similar code!

Neil_Burkholder · June 30, 2018, 2:53pm

An in memory SQLite database is great because you can run a very fast query to determine if start or endpoints max a point or even the check within tolerance.

This app I made with Xojo used an in Memory database to store all the points. On every mousemove I query the db to find out if my point aligns with any of the other points. I f it is with in tolerance it snaps to align with the point and draws the green line. You can see a tiny bit of mouse lag but it is relatively snappy.

Jeff_Tullin · June 30, 2018, 3:56pm

Thats cool!

Neil_Burkholder · June 30, 2018, 6:52pm

Did a quick test adding 100,000 records to an in memory db took about 425 ms (twice that long if not using a transaction, and an additional 840 ms when using random numbers). Performing the following query returned a count of about 1000 items and took 10 ms.

SELECT count(*) FROM lines WHERE startx = 99 AND starty > 50

Jeff_Tullin · June 30, 2018, 7:23pm

This is all great stuff.
Im glad I asked the question.
Im trying a few techniques to see which works best for me / has least impact on the code
Thank you all!

Neil_Burkholder · June 30, 2018, 7:27pm

I’m really curious now why C# is so much faster. Creating a list of 100,000 items with an int ID and 4 random doubles between 0-100 takes about 20 ms. Querying the list using linq takes 1 ms.

HOW are those speeds even possible?

Here is the LinQ query. lines.Where(l => l.StartX == 99 && l.StartY > 50).Count();