I have a quick question: I have a scenario where I will need to update a variety of records quickly and I am thinking that running the executeSQL() method … over and … over again, might be inefficient. Is there a more kosher method for handling a larger volume of records that need to be either added or updated in a table? What about a RowSet? Would that work?
So what do you want to update? A RowSet is a single row. If you want to update individual Rows then you can easily iterate all the values in the RowSet. There’s commands for doing things universally to the table itself and there could be some SQLCommands for changing all the values for a particular column. They just escape me at the moment.
I suggest going to W3Schools and take a look at the SQLCommands.
There you can see examples. Like the SQLUpdate command lets you set the values for all records for specific columns:
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
Without really knowing what you want to do it is hard to prescribe something specific.
I am helping out (volunteering) with a traveling charity that provides free medical care for the poor and elderly. It’s a 100% volunteer project and they do an absolutely wonderful thing for the communities in my state they serve.
One area that they needed help with is keeping track of how many free lunches they give out at the end of the event. In some areas, they give out a lot of lunches, whereas other areas, only a few. Due to this, they need to keep a historical record of how many lunches were given out at a particular city (in the past), which helps with planning future events.
In the past, they have kept track of the meal tickets by purchasing generic ticket stubs – and then they have to manually count them (by hand) to see how many lunches they gave out to the community. I figured that this could be improved by moving to to the digital realm.
I am writing software for them (it’s free) that will give each guest a free lunch ticket when they come in through the door (via a receipt printer). Each ticket has an unique id on it that is scannable (via a barcode scanner).
When a guest comes to request their free lunch, they exchange their ticket for a meal – and the staff will scan the barcode to keep a record of that meal being served. The barcode scanners are in inventory mode, so they keep a list of all meal tickets issued. Once mealtime is finished, they download all of the scanned meal ticket IDs and need to update the database table. The meal ticket ID matches the primary key code, so it’s not a hard match and a boolean field in the row that says whether the ticket has been used (true / false) or not. This makes it easy to count how many tickets were given out and how many were claimed for free lunches.
The question is this: There there a way to mass update the table records at once, or would I have to manually update each record via an UPDATE statement? For example, if 2,000 lunches were given out, that would be 2000 UPDATE statements being run. I am just trying to figure out what would be the best way to update the database table.
Hope this makes sense what I’m trying to accomplish! Thank you much for your ideas / suggestions.
make a string, containing all the sql updates statements separated with “;”
concat them around 1000 at a time (you may adjust this number by trial)
then execute one sql sentence with this big string
it will work in an instant. even on a remote database.
UPDATE meal_tickets SET used=true WHERE barcode_id=12345;
UPDATE meal_tickets SET used=true WHERE barcode_id=12348;
...
UPDATE meal_tickets SET used=true WHERE barcode_id=25428;
I was GOING to suggest:
As the tickets are recorded, INSERT a new record into a temp table.
At any point, you might have up to 2000 rows in there.
When you like, UPDATE the main table WHERE the ID is IN (select ID from TicketList)
But even as I wrote that, it felt more sensible to UPDATE the main table every time a ticket is registered.
Yes, 2000 hits
But if the db goes down/ power is lost, then the updates have happened.
There is no practical performance hit doing it one by one… the updates will be orders of magnitude faster than anyone can do the scanning
would be updates in a transaction faster or running more optimized?
begin transaction
update
update
…
update
commit transaction
on error roll back and error message
Wouldn’t it be more reasonable and better organized for later reporting, to just record the count? If it were me, I would want to end up with a record containing Date, Event Name, Number of Meals. I don’t see much value in the individual ticket records.
One reason is that they would like to see what percentage of the audience will claim the tickets per hour. For example, are people who arrive for medical service at 11 AM more likely to claim the meal ticket (at noon) than someone who arrived early (like 7 AM) for medical care?
This way, they can see the percentages for each hour claimed, such as:
7 AM, 10% of the visitors claimed their meal tickets.
8 AM, 15% of visitors claimed their meal tickets.
9 AM, 35% claimed…
10 AM, 60% claimed
11 AM, 90% claimed
It gives them insights as to the likelihood of meal tickets to be claimed.
i would add a field scanned default 0 and scanned = scanned +1 for the update.
that means serving counter had scanned twice or someone had copy the barcode ticket (its maybe possible, i don’t know).
You are right… NEVER EVER do a chaining rows of updates (same for inserts and deletes) on the same table as some propagates here:
UPDATE…
UPDATE…
UPDATE…
I’ve seen that behaviour many times as this is the typical OO thinking and it was always causing troubles.
The proper solution was posted from @Greg_O_Lone as this is a bulk operation even tough the “IN” compare is not the fastest comparisation method but most obvious in your case. If you need to do this a lot I’d recommend to put an index on the related column (if it’s not the primary key):
…
WHERE related_column IN (…, …, …) -> (unique) Index on related_column
Find out how many “IN” values the DB can take. If a set of meal batch (from scanner) exceeds that one do it then in batches to make sure the IN list is not exceeded.
They make more sense to me, too.
As Greg says, a TRANSACTION is used when you don’t want ‘half a job’.
Say ‘a whole job’ is take money from Dad, give money to Son
If the ‘Give money to’ bit fails, you want them both to unroll, because money wasn’t taken from Dad.
But when you have 2000 ticket validations, each one is independant.
Update each one as you go.
If someone switches the machine off after 1500, 1500 have been done.
Whereas if you were ‘hanging on until 4pm and had a list of 1500 ready to do’, a power cut would lose all the work, perhaps in a way that would be impossible to redo
(which tickets have you seen today? Did you stick them in a box?)
i would use a trasaction with error handling at commit also because the workflow
that require to move or delete the offline scanned barcode list file at success.
a lastmodified date time is also useful in the table.
usually any special optimization make the process more prone to failure.
i would only think about optimization if execution time is too long. (usability)
One big point was already mentioned by Jeff Tullin. F.e. Oracle Database (and many others) are way faster (and also optimized for this) on bulk operations as it takes internal steps doing every update by itself. So a bulk update of 1mio records is faster than doing 1mio update operations.
Imagine if an update fails in between by whatever reason. You just could restart the same SQL and it does also execute the already processed updates (waste of time/resources). In the case here you simply could restart it or could add “AND free_lunch = ‘N’” but there are more complex case where f.e. the evaluation of the update takes several minutes just to get the records to be updated. If you do this now by single row updates it would take forever as it needs overall more time to execute the same evaluation for every single update operation than for all in a bulk.
I’m a SQL Server programmer. The way I’d do it:
Scanned IDs are inserted into a table: ScannedIDs. That has just one field or two: the ID and a timestamp.
Then join the two tables and run the update.
Update MyDataTable
Set Used = true
from
MyDataTable join ScannedIDs
on MyDataTable.TickeID = ScannedIDs.TicketID
If all the tickets were used, you wouldn’t have to count the tickets, you could just look at the first serial number on the ticket handed out and the remaining serial number on the roll of tickets and subtract - because the numbers are sequential.
BUT - the handed out ticket might not be used. So your scanning gives you a list of used tickets. If you want to relate them back, it seems your thinking is limited to one database. That limits your options.
My database work was with Panorama - a database for Mac. In my role, I found people seldom thought of an “accessory” temporary file. If I had to do something like you described. I’d read the scanned records into one file, then I’d do some seek from that file (less records than the “main” file) so shorter integration, that selected the matches and issued a group update to the field on the selected matches. Because the commands I’d use are specific to Panroama, they are useless to provide here. But the concept might be useful. Hence my post. I think the previous post that mention “join” is it the same spirit. I’m just use to a different command language.