Prepared statement in bulk insert

Hi all,

This is my first post on this forum. I’m wondering what is the best way to do a bulk insert using prepared statements. If I understand correctly, with API 2.0 it is no longer necessary to set the type of each parameter etc, as this is handled by the db.ExecuteSQL method.

When reading a text file and inserting 104,000 records using the ‘old’ prepared statement method (as in API 1.0), it takes around 3,000 ms to complete. Using db.ExecuteSQL takes 5,000 ms. I assume the latter is slower because the sql statement needs to be recreated each time?

Am I doing something wrong, or should I just stick with prepared statements?

Many thanks for your help!

//Using prepared statement:

Dim ps As SQLitePreparedStatement = db.Prepare("INSERT INTO Countryspecies (CspID, SpID, CountryID) VALUES (?, ?, ?)")
  
  db.ExecuteSQL("BEGIN TRANSACTION")
  
  Dim tis As TextInputStream = TextInputStream.Open(f) // Open as a text stream
  
  While Not tis.EOF
    
    Dim line As String = tis.ReadLine
    Dim tab As String = ChrB(9)
    Dim CspID, SpID, CountryID as Integer
    
    CspID = val(NthField(line, tab, 1))
    SpID = val(NthField(line, tab, 2))
    CountryID = val(NthField(line, tab, 3))
    
    ps.BindType(0, SQLitePreparedStatement.SQLITE_INTEGER)
    ps.BindType(1, SQLitePreparedStatement.SQLITE_INTEGER)
    ps.BindType(2, SQLitePreparedStatement.SQLITE_INTEGER)
    
    ps.Bind (0, CspID)
    ps.Bind (1, SpID)
    ps.Bind (2, CountryID)
        
    ps.SQLExecute()
    
  Wend
  
  Try
    db.CommitTransaction
  Catch error As DatabaseException
    MsgBox("Error: " + error.Message)
    db.RollbackTransaction
  End Try
  tis.Close




//The new method:

  
  sqlstr = "INSERT INTO Countryspecies (CspID, SpID, CountryID) VALUES (?, ?, ?)"
  db.ExecuteSQL("BEGIN TRANSACTION")
  
  Dim tis As TextInputStream = TextInputStream.Open(f) // Open as a text stream
  
  While Not tis.EOF
    
    Dim line As String = tis.ReadLine
    Dim tab As String = ChrB(9)
    Dim CspID, SpID, CountryID as Integer
    
    CspID = val(NthField(line, tab, 1))
    SpID = val(NthField(line, tab, 2))
    CountryID = val(NthField(line, tab, 3))
    
    Dim values(2) As Variant
    values(0) = CspID
    values(1) = SpID
    values(2) = CountryID
    
    db.ExecuteSQL(sqlstr, values)
    
  Wend
  
  Try
    db.CommitTransaction
  Catch error As DatabaseException
    MsgBox("Error: " + error.Message)
    db.RollbackTransaction
  End Try
  tis.Close

If you’re inserting large numbers of rows inside a transaction, I’d say it is better to prepare once (separately) and bind/execute for each row, then commit, as in your first method, which you can still do using API2 methods.

In my own use case, I rarely need to do that, and I have removed prepared statement usage from my app and so have the convenience of the API2 methods with the security of the internally prepared statement. In my case there’s no speed penalty, but yoiu are seeing one as in your second method you re-prepare each time.

SQLite supports importing csv files via the .import command. You may want to try that, see https://www.sqlite.org/cli.html#csv_import

Nothing wrong - you might even stick with the old usage as its not gone
Its exactly how you should use prepared statements - prepare once and use lots

can you test if this stripped-down is faster?

[code]//The new method:

sqlstr = “INSERT INTO Countryspecies (CspID, SpID, CountryID) VALUES (?, ?, ?)”
db.ExecuteSQL(“BEGIN TRANSACTION”)

Dim tis As TextInputStream = TextInputStream.Open(f) // Open as a text stream

Dim line As String 
Dim tab As String = ChrB(9)

While Not tis.EOF

line = tis.ReadLine
db.ExecuteSQL(sqlstr, val(NthField(line, tab, 1)), val(NthField(line, tab, 2)), val(NthField(line, tab, 3)))

Wend

Try
db.CommitTransaction
Catch error As DatabaseException
MsgBox("Error: " + error.Message)
db.RollbackTransaction
End Try
tis.Close[/code]

actually the first usage could likely be even faster
create and bind the vars once then just reuse them over & over

   Dim ps As SQLitePreparedStatement = db.Prepare("INSERT INTO Countryspecies (CspID, SpID, CountryID) VALUES (?, ?, ?)")
    Dim CspID, SpID, CountryID as Integer
    ps.BindType(0, SQLitePreparedStatement.SQLITE_INTEGER)
    ps.BindType(1, SQLitePreparedStatement.SQLITE_INTEGER)
    ps.BindType(2, SQLitePreparedStatement.SQLITE_INTEGER)
    
    ps.Bind (0, CspID)
    ps.Bind (1, SpID)
    ps.Bind (2, CountryID)
  
  db.ExecuteSQL("BEGIN TRANSACTION")
  
  Dim tis As TextInputStream = TextInputStream.Open(f) // Open as a text stream
  
  While Not tis.EOF
    
    Dim line As String = tis.ReadLine
    Dim tab As String = ChrB(9)
    
    CspID = val(NthField(line, tab, 1))
    SpID = val(NthField(line, tab, 2))
    CountryID = val(NthField(line, tab, 3))
    
        
    ps.SQLExecute()
    
  Wend
  
  Try
    db.CommitTransaction
  Catch error As DatabaseException
    MsgBox("Error: " + error.Message)
    db.RollbackTransaction
  End Try
  tis.Close

Also move the tab declaration above the While loop, and use Split rather than NthField.

But I doubt anything in code is going to be faster than this, if you can do it:

My point isnt so much “how do I get a csv into sqlite” but “api 2 doesnt obviously support this prepared statement usage pattern” and yet this is a perfect reason to use one. Importing the csv to sqlite is probably the fastest way to get the data in.

some specific db’s might require re-preparing every time (something I consider their bug as its antithetical to at least one reason why you would use a prepared statement)

but, those that dont, you still have no obvious means of avoiding API 2’s “reprepare every time” mode which is just slower

As it stands, the docs sort of warn you off using prepared statements saying that generally they should not be needed, which is generally true. But this is a good example of where they are very useful.

Also: Thanks Norman for pointing out that even the bind need only be done once. I don’t think the doc makes that clear either. So, one is binding a variable not a value.

I put in a feature request to add something to the API usage style to make this form of usage NOT re-prepare all the time as its a serious performance hit
<https://xojo.com/issue/60697>

some db’s require you to reprepapre all the time
I consider that the DB vendors bug / issue

Thank you for all your suggestions, much appreciated.

@Markus, I tried your code and this was slightly faster (a difference of 140 - 150 ms).

@Norman, when I place both the ps.BindType and ps.Bind outside of the While loop, the result is an empty table. But when I keep the ps.Bind inside the loop it works fine. There is no speed difference however (still 3000 ms).

I quickly looked into Maximilian’s suggestion of using an .import command, but haven’t figured out yet how I can make that work in Xojo. That will need some more time.

Reading a text file in larger parts like an MB or so is much faster than using readline.

Bah - sqlite does this so wrong :slight_smile:
Yes for SQLITE you need to recreate & rebind on every iteration
One of the few DB’s this is true for

The .import command belongs to the sqlite3 CLI, rather than the library, so to use it you’d need to spawn a shell.

Great discussion, guys!
So, I’m trying to learn from this. Do I understand correctly that @Erik Klop’s OP using API 1.0 with Prepared Statement turns out to be the best approach, with possibly two alternatives:

  1. Use SQLite Import to directly import the CSV file
  2. Read the file in larger chunks (can someone show an example of this?)

[quote=493081:@Jim Underwood]So, I’m trying to learn from this. Do I understand correctly that @Erik Klop’s OP using API 1.0 with Prepared Statement turns out to be the best approach, with possibly two alternatives:

  1. Use SQLite Import to directly import the CSV file
  2. Read the file in larger chunks (can someone show an example of this?)[/quote]

If you’re doing (1), then you may as well not write a Xojo program at all. Just do it from the command line in a Terminal window.

After a lot of trial and error I finally managed to get it working through a Shell command. And it’s extremely fast: 140 ms for inserting 104,000 records!

Dim sh as Shell = new Shell
sh.Mode = 0
sh.TimeOut = -1

If Not sh.IsRunning then
  sh.Execute "sh"
End if

dim dbfile as FolderItem
dbfile = SpecialFolder.ApplicationData.Child("Colibri").Child("Colibri2021.db")

dim cmd as string = "sqlite3 " + dbfile.ShellPath + " "
cmd = cmd + """.mode tabs""" + " "
cmd = cmd + """.import " + f.ShellPath + " Countryspecies """

sh.Execute cmd

Nice.

FYI, you don’t need that first Execute. A Shell in Synchronous mode (Mode = 0) will execute a command and exit while your Xojo app waits. In your code, it is starting an instance of sh (within a shell), exiting, then running your command. IsRunning will always be False.

If this command took longer, you would use Asynchronous mode (Mode = 1) and implement the Completed event to let the user know it was done.

Also, I love stories like this. I’ll bet you had to check three times to make sure it actually did anything because you couldn’t believe it could be that fast.

Thanks Kem, this is very informative! And yes, the speed is equivalent to nearly 750,000 inserts per second - unbelievable!