Iterating RowSet and Storing DatabaseRow in In-memory SQLite Database

Craig_Boyd · October 5, 2023, 2:14pm

That is good to know that the plain text INSERTs made no difference. I would have assumed them to be slightly faster.

The large text fields may be slowing the loads depending on how much text they contain. If you don’t need them maybe don’t include them in the load.

My question around FKs and Indexes was 100% about the target table. For large table loads (think data warehouse loads) sometimes we will disable the FKs and Indexes to speed up the loads and then enable them post load.

I will continue to give this some thought, but right now I can’t think of any other things to try.

Samiul_Hassan1 · October 5, 2023, 2:39pm

Thanks for your help. It’s just a little frustrating I can’t get to the bottom of this!

Tim_Parnell · October 5, 2023, 4:32pm

I’ve sent you a private message.

DerkJ · October 5, 2023, 7:13pm

So you loop trough a result set to update a result set?

You can do this directly is SQL, which may be multiple times faster.

What are you trying to do in short, from which table into which table?

TimStreater · October 5, 2023, 8:12pm

I did wonder about this, too. If both databases are using the same type of database, you could attach the results database (as res) to the source one, and do something like:

insert into res.mytable select * from main.mytable;

which would copy the whole table. But as the source and destination databases are from different databases (SQLite, mySQL, etc) then I’d be surprised if one could do that.

Samiul_Hassan1 · October 6, 2023, 6:19am

I am running the original query against the remote SQL Server database. Then I am looping the resulting RowSet and trying to cache the data in memory by storing in an in-memory SQLite database.

It looks like inserting into the SQLite database is the slow part. I was reading somewhere online that inserting into SQLite can be very slow. Some people suggest changing certain properties using PRAGMA but I am not sure whar this actually does and if this is a good idea in my case.

If i should be using PRAGMA and it will improve performance, please could you suggest what settings I should use and why?

DerkJ · October 6, 2023, 7:06am

I was more like pointing at a temp db that has less data and may be cached. In one query you should be able to tranfer your data to a tempdb.

Samiul_Hassan1 · October 6, 2023, 9:57am

I am working on a solution to periodically transfer the original query result data to another database on the server that is indexed so works as a cache for quick access, and instead query the temp db from the application.
However, where I need most up-to-date data and is critical, I need to query the original MS SQL Server on demand from the application and cache it in memory, hence why I am trying to copy it into an in-memory SQLite db.

Wayne_Golding · October 7, 2023, 9:12am

Have a look at this project that uses a prepared statement to insert the data into the in-memory database. I’m seeing 6.5s for 15000 records with 29 fields on an i5 processor.

A better option may be to install an Express Edition engine locally and link it to the production server. You can then use the Select Into options mentioned above to create a local snapshot of your data for analysis.

Serge_Louvet1 · October 8, 2023, 9:46am

Do you need to run sql queries on the cached data ? Or only simple filtering/select ?

If the later:
I tested on a library loosely inspired by pandas (python) that is internally saving data in arrays (1 array per column) and got something like 15 columns / 60Krows inserted in 1 sec (Mac M1) It does not have a sql interface. Would this help ?

Samiul_Hassan1 · October 8, 2023, 1:47pm

Thanks for sharing. I tried this method and the performance is unchanged unfortunately. It’s still taking around 172 seconds to write the 1333 records to the SQLite DB.

Samiul_Hassan1 · October 8, 2023, 1:53pm

Ideally I would like to run SQL queries against the cached data to get distinct values from multiple column combinations, and other select queries. It just made most sense to me as I thought it would be the quickest to query as in-memory SQLite database are supposed to be very fast, plus it avoids writing lots of loops and lines of code to create the query string.

I’d be keen to try the library inspired by pandas. Is it a built-in Xojo class? And would I be able to select distinct values from multiple columns easily without having to write loops etc that may slow it down?

Samiul_Hassan1 · October 9, 2023, 11:40am

I have now tested different database parameters using PRAGMA before inserting the database rows into the SQLite database.
Based on what I have been reading online about improving SQLite insert speeds, I have experimented with the following parameters with different values. Although others online have reported that by modifying these values, huge performance gains can be realised, I haven’t seen any significant change.

cache_size
synchronous
journal_mode
locking_mode
count_changes
temp_store

I am getting about 8-10 inserts per second (woefully slow) and cannot seem to increase it

Samiul_Hassan1 · October 12, 2023, 2:25pm

This issue has now been solved. The problem was being caused by the DatabaseName property of the MS SQL Server db not being specified before making a connection, such that after a query was made and stored in a RowSet variable, the application would hang, massively slowing down the time to write the data from the RowSet to the in-memory SQLite database.
After specifying the DatabaseName property of the database, the time to write to the SQLite db reduced from around 170 second to 0.6 seconds!

Here is the post where I identified the issue when I focussed on just retrieving the data from the database (and not writing to the SQLite database).

Crashing upon closing application following database query - Targets / Windows - Xojo Programming Forum

I suspect this is a bug and so I will raise the issue.

Craig_Boyd · October 12, 2023, 2:44pm

That is great news! Congrats on sorting that out.