Pull data into SQLite in-memory

We have a project that has a process that is extremely data intensive. We have methods to pull the necessary data into an in-memory SQLite DB from PostgreSQL through our API and methods to manipulate the data as needed, but we’ve come to the point where we need to push all changes and new records back to the primary database.

We do not create a direct connection to PostgreSQL (intentionally) and instead use an API I wrote and the tables and columns all match since this is a relatively quick process if that makes a difference.

Is there a “proper” or “best” way to pull this off? My first thought is to loop through the entirety of each table in SQLite and upsert the data back to PostgreSQL through our API.

Well, you only need to find the records that have changed, right? If you have LastModified timestamps in your table this should be relatively easy since you can find that out when you create your in-memory database. When you’re ready to sync back up you just get those records that have changed.

Short of that, yeah, you’ll probably have to iterate through your records. That will be tricky if you have Foreign Key data (i.e. parent-child records) in which case you’ll have to worry about order. If you have no new or deleted records this is no big deal.

For speed I often create two Strings for my SQL statements, one to create the single UPDATE command and the other an array to hold multiple commands.

As I cycle through each record I create the single SQL String, but never send it to the PostgreSQL database. Instead I add it to the String array (ensure it ends with a semi-colon). After a batch of SQL commands (eg 100), I send these command to the PostgreSQL database using the Join(myArray, EndOfLine) command and ReDim the array.

This may make the updates faster,

You could do the same thing by making a transaction. I had some statistics at one point but putting inserts/updates inside the transaction was much faster than doing them individually. So similar approach.

[quote=402395:@David Cox]For speed I often create two Strings for my SQL statements, one to create the single UPDATE command and the other an array to hold multiple commands.

As I cycle through each record I create the single SQL String, but never send it to the PostgreSQL database. Instead I add it to the String array (ensure it ends with a semi-colon). After a batch of SQL commands (eg 100), I send these command to the PostgreSQL database using the Join(myArray, EndOfLine) command and ReDim the array.[/quote]

This would work if we made a direct connection to the database, but given that it’s through an API, it’s a little more complicated.

I’d put money on a transaction being much faster. PostgreSQL is wicked fast when you use all of its performance enhancing features.

The solution we’ve gone with is actually to create a modular script that will grab the names of tables from SQLite and then the column names. Then we will pull a recordset and loop through it writing our API call column by column for each record. Once a table has finished it will send it to the API for a massive upsert. I’ve done some preliminary testing and even upserting 100 records (which, for now, is on par with what we will be doing) is wicked fast. So we’ve now compressed the amount of network traffic into the number tables in SQLite. The cool part is that since this is modular we can reuse this any time we need it and I would be happy to share a generic version of it once I’ve worked through the kinks.