SQL Insert vs Insertrecord

John_Fatte · May 16, 2014, 4:20pm

Just wondering if there is a trade off in terms of speed when using and SQL insert into command vs a recordset with insert record?

We have some customers who do a huge amount of importing from Excel into our application and we are currently using a recordset with an insert record. We could convert it to an SQL insert into command but it would take a huge amount of code because there the insert into would record a variable for each field and there are a variable number of fields based on their selections. Could be as many as 200 fields or more.

Daniel_Taylor · May 16, 2014, 5:14pm

If you’re inserting a large number of records then you can improve speed by joining all the insert statements together and executing them as one transaction. Of course you need to think about SQL injection attacks when generating your own SQL text.

Whether or not the speed gain is worth the effort for you depends entirely on the situation.

Jeremy_Cowgar · May 16, 2014, 5:29pm

John, are you performing the imports in a contained transaction? That can speed things up. Also look at how your database is indexed. Too many indexes, indexes on things that are not really needed, etc… all slow things down.

Kem_Tekinay · May 16, 2014, 6:00pm

It may also be faster to drop the indexes, do your massive inserts, then index again.

John_Fatte · May 16, 2014, 6:19pm

No, I can assure you that indexes are problem not the hang up.

It appears that office automation is just not very speedy especially when there are hundreds of columns of data that you need to import, and in addition, some of that data needs to be massaged before adding to the db.

Unfortunately, because the data from one spreadsheet could get added to no less than 3 tables and as many as 12 or more tables, we’re talking about a number of insert statements that would have to be constructed for each row of the spreadsheet.

I currently have this being done in some very concise code but it’s being done with a recordset instead of insert statements. The reason it can be done with very little code is because included in our application is a data dictionary (sort of) that can be used to get field names, field types, and the data contained in each column of the spreadsheet. So the new record is built in a for … next statement instead of listing each field in the tables.

It’s very efficient and the only thing I’m trying to determine is whether there would be any significant speed increase if this was done in an “insert into” statement as opposed to a insert record. If there is no obvious time savings, I wouldn’t even try to re-write the code.

I know how hard it is to predict what the time savings may be, but I just thought someone might be able to say that an insert record is 10 times slower than a comparable insert into statement. If that were the case, it might be worth it, if it’s only slightly faster, I’d just leave well enough (fully debugged code) alone.

Jeremy_Cowgar · May 16, 2014, 6:29pm

What about transactions, are you running it inside of a transaction?

John_Fatte · May 16, 2014, 6:30pm

I am only doing one commit after everything is imported so I don’t think this will buy me any time.

John_Fatte · May 16, 2014, 6:33pm

We have done comparison tests with converting the Excel file to CSV and it does import considerably faster, so I’m pretty convinced that importing from Excel is just not very fast.

Kem_Tekinay · May 16, 2014, 6:42pm

If you are pulling individual cells from Excel, you are much better off pulling rows or columns at a time. It’s much faster.

John_Fatte · May 16, 2014, 6:54pm

Kem, I’m not sure I follow how you would pull a row at a time from an Excel spreadsheet instead of reading each individual cell.

It’s pretty easy to do that with a CSV file, but how do you accomplish the same thing with an Excel spreadsheet?

Kem_Tekinay · May 16, 2014, 7:09pm

I’ve only done this through AppleScript on a Mac, but I’ve found it translates pretty well. Something like this would return an array:

tell application "Microsoft Excel"
	tell active sheet
		value of columns 1 thru 5 of row 1
	end tell
end tell

John_Fatte · May 16, 2014, 7:16pm

Unfortunately, all the import is being done in Windows. I know I could read a range of cells, but that might make more work since I’d then have to parse data that will be of varying lengths. Applescript is not an option in this case.

Kem_Tekinay · May 16, 2014, 7:26pm

To be clear, I wasn’t suggesting that. I was saying that this is how I do it in AppleScript, and it translates well to doing it through the Xojo classes.

If you pull one cell at a time, there is a lot of time spent in communicating with the app. The request is sent to Excel which has to process it, then return an answer. For a sheet that’s just 10 X 10 cells, that’s 100 roundtrips, and each one is costly in terms of time. If you get it a row at a time instead, you’ll speed it up by a factor of 10. If you get it a sheet at a time, you’ll eliminate that bottleneck almost entirely.

Experiment to see. I’d bet you’ll be as shocked as I was.

Daniel_Taylor · May 16, 2014, 7:33pm

[quote=88715:@John Fatte]It’s very efficient and the only thing I’m trying to determine is whether there would be any significant speed increase if this was done in an “insert into” statement as opposed to a insert record. If there is no obvious time savings, I wouldn’t even try to re-write the code.

I know how hard it is to predict what the time savings may be, but I just thought someone might be able to say that an insert record is 10 times slower than a comparable insert into statement. If that were the case, it might be worth it, if it’s only slightly faster, I’d just leave well enough (fully debugged code) alone.[/quote]

I’m kicking myself for not keeping my time test notes from a recent project…

Using PostgreSQL: if you are inserting a lot of records and you can collect them to be inserted in a single call to SQLExecute then you can save a lot of time over the Xojo framework. It’s not a framework problem per se, it’s just that sending everything at once is more efficient. It’s also not just a problem of having a single transaction block because if you do that with the framework objects or with insert strings sent individually (i.e. in a loop) you do not see the speed gains.

I’m not sure about the other database engines or plugins.

Whether or not it would be worth it to you…profile your current code and see if the database inserts are even a significant percentage of the time involved. If they are taking a long time…double digit seconds or minutes…then create a test table and try inserting n rows using the Xojo objects in a loop, then using a single call to SQLExecute. If n is comparable to what you’re doing now then you should get a reasonable idea of the potential speed gain.

Andre_Kuiper · May 16, 2014, 9:42pm

[quote=88726:@John Fatte]Kem, I’m not sure I follow how you would pull a row at a time from an Excel spreadsheet instead of reading each individual cell.

It’s pretty easy to do that with a CSV file, but how do you accomplish the same thing with an Excel spreadsheet?[/quote]

Why not use ODBC to read the cells from the exceltables?

Christian_Schmitz · May 16, 2014, 10:13pm

For Excel files, I have a XL plugin which can read and write old and new excel file formats without excel being installed.

John_Fatte · May 17, 2014, 1:40pm

Thank you for the replies. You’ve got me thinking, but perhaps not in the way you might think.

The first question that needs to be answered is where is the time being consumed on the import. It can only be in one of two places.

Reading the cell data from Excel
Writing the data to the Sqlite tables

So I first want to find out just where time can be saved.

As I stated before, this import is data driven by a table that has all the fields from all the tables that can be imported. It also has the data types for each field so it can correctly add it to the Sqlite tables. While there are literally hundreds of fields that the user can elect to import, they pick and choose the ones that they want to import and then structure their Excel file accordingly. Our application will generate an Excel template file with field names and types in the first few rows to assist them in populating their spreadsheet. Here’s a snippet of the code that populates the record before the “insert record” is executed.

while not last row_in_Excel

 for x as integer = 1 to last_column_in_Excel
   cValue = Excel.range(cell).value
   cFieldname = get_from_data_dictionary
   cFieldtype = get_from_data_dictionary

  select case cFieldtype

           case "TEXT"
                    rec.column(cFieldname) = cValue

           case "DOUBLE"
                   rec.doublecolumn(cFieldname) = cdbl(cValue)


           case "INTEGER"
                  rec.integercolumn(cFieldname) = cValue.val

          case "LOGICAL"
                  rec.booleancolumn(cFieldname) = IIF(cValue = "T", True, False)

     end

     cell = Next_Cell()
   next

   INSERTRECORD 
  
  skip to next record in data_dictionary that was selected to import

wend

There’s more to it than the code above to handle various data, such as dates, that can be imported, but this gives you an idea of how little code is involved in importing the data right now because of the data dictionary. Export works more or less the same way, that is, it is data driven off the dictionary.

So I think the first step is to put some timing code in place to find out where the time is being spent… that is unless someone can see something that would make this more efficient.

Building the “insert into” SQL statement would be pretty easy except for the fact that I would have to save the cell values to variables so they could be included in the “insert into” statement.

With regards to doing ONE “insert into” statement, I don’t really see how this would be possible without first adding the data to an in memory table and then inserting from it. Keep in mind that it’s NOT just one table. There are a number of tables that are updated from the Excel spreadsheet. The data dictionary tells you what table to add the data to and so the code above would be run multiple times - each time a new table is reached in the data dictionary, it has to only read the cells from the spreadsheet that belong in that table.

Kem_Tekinay · May 17, 2014, 2:04pm

Your plan is a good one.

I have a Stopwatch class that may help you, just to make the timing code cleaner. I’ve made it available here.

John_Fatte · May 17, 2014, 3:27pm

Thanks Kem. I downloaded your class and will put it to good use. I’ll post the timings I get in this thread after I get done. Stay tuned.

Tim_Hare · May 18, 2014, 1:53am

Have you used the profiler on your code?