In Memory Dataset

I have a csv file that is 500mb and I wrote a Xojo program to load the records from that file into an object collection for in-memory searching of the data. The Object is pretty simple. Each Object has a collection of fields (string array) along with about 4 properties that tell me something about the data in the record. The Field Collection which is a property in the object is an array of 122 elements. I then create a collection of the objects so that I can have the dataset in memory. Once loaded the user interface allows for choosing a “field” to search and a search criteria to search for in that field. The records that match the search criteria are loaded into a listbox or in this case an xListBox. The application works pretty well if the text file is about 100mb, which represents about 3 weeks of data we are reviewing. If we grow the file up to the 500MB (5 months of data), the program starts loading the data into memory, but reaches a point where it just closes. No error message.

Is there a memory limitation for how much data can be loaded into memory using xojo arrays? Or is there a bounded limited for how many elements can be in a xojo array?

ProcessRunning = true
  RowCnt = 1
  dim rowFromFile As String
  while not ts.EOF
    RowCnt = RowCnt + 1
    rowFromFile = ts.ReadLine
    dim c As new CallRecord
    c.LoadCells(rowFromFile)
    CallRecords.Append(c)
    // for debugging - Limits the number of rows
    // if rowCnt > 4000 then exit
  wend
  ts.Close
  ProcessRunning = false

The above code segment loads the data into the collection of objects. ts is a textstream that is instantiated on the main window. CallRecord is my object that has the collection of fields and CallRecords is a collection of CallRecord. This runs in a thread so that I can keep the user interface updated of the progress. With a 90MB file, I’m getting 192,500 records / Object instances successfully. When loading the 500+mb file, the program gets to about 700,000 object instances (records) and then just closes. If there is a limitation of how many instances or how much memory I can use with Xojo objects, I’d like to know that I can accommodate. I get the same closing issue on Mac OSX, Windows 10 and Linux Ubuntu 14.04LTS 64bit.

It sounds like you are running into the 32-bit limit for memory access. Does the app exhibit the same issues when compiled in 64-bit?

I wonder though why you’re not using a SQLite database to do this work? Converting the file to SQLite should be trivial, and SQLite will handle all of the searching for you quickly if properly indexed. You can convert individual records to objects as needed.

The application is running as 64bit on my Mac, so yes it does exhibit the same behavior.

I thought about doing the SQLite load or even using my Valentina database, but I wasn’t looking for anything persistent. I just wanted to load the log file, open it, search for some data, export that data and then be done. But if I continue to experience the same behavior or we can’t work with the log files in smaller chunks, I might have to go down that path. It slows the process and adds a little complexity to the problem, but that may be the cost of doing this in Xojo.

This is something that Python with the Pandas library would excel at, but I wanted the program cross platform and as easy as handing a person a memory stick and running the program against the data file. The Python complexity comes at the deployment level. Always a Cost.

Thanks for the post.

If you leave the database file nil, it creates an in-memory database.

Tim,

Thank you!

That’s a cool idea, I didn’t know you could do that with SQLite. I would still have to do a Create Table for all the field values and then do an insert into that table with all the data from file correct? There isn’t a tricky method of throwing the array at the table it assuming the insert statement is there? [thinking out loud] - I’d have to programmatically create the table and store off the field names so that I can programmatically create the insert statements.

I’ll play with this

yes. create tables.
I would put database into temp file and use a big cache size.
Just to make sure you can limit memory usage.

According to an “in memory database”:

  • Is it for sure all the memory will be released after closing the connection and setting the object to nil ?

the memory is freed.