MySQL MaxID returning stale value after new insert

William_Reynolds · April 8, 2022, 11:30pm

I’m trying to get the MAX value on a table, then increment it (and use it), insert the new higher value, and then the next time the function runs it’s supposed to do the same thing - creating a serialized list of ID numbers, a new one each time. I’ve been using:

SelectSQL(“SELECT MAX(sensor_manuf_id) AS MaxID FROM sensor_manuf_ids”)

…but it seems like the dbase may be returning a MAX value that is from a prior record, and not the last one inserted.

I’m using DB.CommitTransaction before the function ends, and before MAX value request is sent again, but I still periodically get a duplicate (ignores latest insert and gives me value of last).

Are there tips or precautions I should be following when it comes to working with MaxID? Thanks!

David_Cox · April 9, 2022, 11:52am

Shouldn’t you INSERT do the work for you:

INSERT INTO sensor_manuf_ids (…, sensor_manuf_id) VALUES (…, MAX(sensor_manuf_id) + 1);

William_Reynolds · April 9, 2022, 11:18pm

Ideally, but there are times when MAX is queried for other aspects of the operation, and the value returned is lagging behind inserts done by other connections into the system. Their inserts aren’t committed quickly enough for the dbase to reflect accurately (is what I’m suspecting). When throttling connections to the dbase the problem never occurs, it’s only when multiple back-to-back inserts are happening that a stale value is returned.

Rick_Araujo · April 9, 2022, 11:53pm

It’s hard to understand the process and data flow design only by this conversation BUT things like MAX(ID) wherever it is used looks like an usual conceptual bad data flow design only possible for standalone systems (one user all the time). I have one old legacy system that does it, and that’s exactly by this kind of design that it needs an entire redesign for a multi-user version.

Rick_Araujo · April 10, 2022, 12:14am

When you get multiple START TRANSACTION concurrently, let’s say 3 of them, and all they make a SELECT MAX(ID), all the 3 will get the same value, then all them will do inserts and each COMMIT will increase the IDs (for AUTO_INCREMENT IDs), SO you get for example, 3 IDs 1000 read, and wrote records 1001, 1002 and 1003 (probably with some field saying 1000 and that wasn’t the intention).

William_Reynolds · April 10, 2022, 12:53am

That makes perfect sense, of course. In this case it’s a single centralized app that is the broker, and it handles and queues all requests to/from the dbase and thru the logic. This singular broker can handle modest traffic, however when the requests come too frequently then that’s where I’m suspecting the MAX value isn’t keeping up. This is obviously a design flaw on my part, but let me to want to better understand if this is a dbase limitation, or just bad coding/design on my part.

Ivan_Tellez · April 10, 2022, 1:23am

Not really, that is a timing problem, You could have a duplicated value even if you have only 2 users doing a single request each at the same time.

Since you are not using the database engine features, it is the coding part.

Why are you increasing the value manualy? Any reason to not have an autoincrement field and let the database increase it for you?

So when you insert the new row, the id field is increased, then you can use the database function to get the last inserted id to be used in another query

kevin_g · April 10, 2022, 11:20am

I suggest you read about MySQL transaction isolation modes.

The default is Repeatable Read which basically means Select will always return the same values when executed within the same transaction. Are you running with AutoCommit off or maybe have transactions open for a while?

You could try changing the isolation mode to Read Committed to see if that fixes the problem. However, I wouldn’t leave it in that mode without understanding how it affects your SQL code.

I also feel that your insert is faulty:

INSERT INTO sensor_manuf_ids (…, sensor_manuf_id) VALUES (…, MAX(sensor_manuf_id) + 1);

You are asking MySQL to determine the highest sensor_manuf_id, add 1 to it and then insert it into the database.

If you do this multiple times in the same transaction then Max() could return the same value based on the transaction isolation level.
My understanding of MySQL is that the Max() and the rest of the Insert won’t be performed atomically which means that if there is more than one connection you may find the value determined by Max() is not the maximum value by the time the rest of the insert is performed.

I think this would be safer if sensor_manuf_id was an auto increment column.

David_Cox · April 10, 2022, 11:35am

You may be right, but I would find it extremely worrying if the database internals were returning inaccurate results at the time of execution.

kevin_g · April 10, 2022, 12:15pm

It’s not inaccurate.

Connection 1 performs Max and stores it in a variable.
Connection 2 performs Max and stored it in a variable.

NOTE. At this point the number of rows hasn’t changed so they both have the same value.

Connect 1 Inserts its values
Connection 2 Inserts its values

Both rows will contain the same Max value as both connections retrieved it before the number of rows changed.

The problem is that you have assumed the entire SQL statement was atomic but that would require the entire table to be locked while the statement was executing which would kill concurrency.

The only way to avoid this is to use auto increment columns or surround the SQL statement in table locks yourself.

It’s a very similar problem to multi-threaded programming in languages such as c.

William_Reynolds · April 10, 2022, 2:48pm

Wow, I never fail to learn something new when interacting with the forum, thanks @kevin_g! I’m going to do as you suggest and get more familiar with ‘isolation modes’, which sounds like something I should have learned years ago. Again, thanks for the direction.

Rick_Araujo · April 10, 2022, 6:20pm

Even without understanding the process, I’m still not convinced that only changing the isolation here will be capable of fixing this kind of thing without some planned serialization of the process. The process should be designed and a simulation of concurrency should be done “in paper” looking for possible ways of fail getting improper values from concurrent clients. Probably the design will need some SELECT FOR UPDATE (will hold the same operation in such row in concurrent clients until the end of the transaction) forcing some planned serialization of the process (temporary row lock of an ID, a row, not a table lock). This is usual using the default isolation.

William_Reynolds · April 10, 2022, 7:33pm

I do agree, and I’ve recoded this one piece of the process in a proper fashion. It’s funny you mentioned pen and paper as that’s always been where I go before writing a single line of code. In this particular case I’ve refactored and created a few stress tests to ensure it can handle 3x the expected throughput. I’m never going to be a 1st-class DBA, but I know a few, and I’m going to run this post past them. If I find any more insight I’ll be sure to append it to this post.

kevin_g · April 10, 2022, 7:42pm

Changing the isolation mode won’t address what I think is a flaw with using Max() instead of an auto increment column. Table locks are probably the only safe way to use Max() in this scenario. My gut feeling is that SELECT FOR UPDATE on the row with the highest sensor_manuf_id would also suffer from some kind of race condition.

Changing the isolation mode could help if there was a problem with stale reads when requesting the same data multiple times within the same transaction.

Rick_Araujo · April 10, 2022, 8:30pm

Locks, not “table locks”. SELECT FOR UPDATE causes the proper focused lock, not unnecessarily holding sessions not involved with the specific case.

Something like:

SELECT sensor_manuf_id As MaxID FROM sensor_manuf_ids 
WHERE sensor_manuf_id = (SELECT MAX(sensor_manuf_id) FROM sensor_manuf_ids) 
FOR UPDATE;

Will read and lock that sensor data, another session trying the same will be on hold waiting for it.

If you need to make some generalist lock, based on the process and not in such data, you can create a table with semaphores you can call for example “lock_events”, like a string column called for example “lock_key” and insert the event in that list like the value “hold access to sensors” and…

SELECT lock_key FROM lock_events 
WHERE lock_key = 'hold access to sensors' 
FOR UPDATE;

Again, only the guy designing the process will know the proper way to conduct the proper locks to minimize the speed impact. As keving said from his guts, sometimes it’s absolutely necessary the use of the most striking version.

kevin_g · April 10, 2022, 9:28pm

SELECT FOR UPDATE will lock row(s) and your suggestion may work correctly.

Personally, I would go for the safe and more scalable option which would be an auto increment column.

Rick_Araujo · April 10, 2022, 9:40pm

I’m also all against Max(ID), but

William said he had reasons for such design, I’m taking that in consideration.

Matthew_Stevens · April 11, 2022, 8:36am

Surely a design that allows the client to assign a unique incremental ID is a flawed design, in a multi-user environment?

Just curious. As an alternative to autoincrement, might a stored procedure resolve the concurrency clash? The client passes the values for the new record and leaves the DBMS to assign the ID.

Rick_Araujo · April 11, 2022, 9:08am

In simple terms, a stored procedure just aggregates a lot of commands in the server (ignoring many extra features, increase of speed due to pre-compilation of lots of commands, etc). Passing those commands to be executed or executing them stored in the server COULD result in the same results if doing very SIMPLE tasks. For complex tasks, like storing that MAX(ID) acquired after the lock, to apply in many places, a stored proc would be certainly a desired way of doing it.

That said you can solve the “Max(ID) Case” applying proper locks in a stored procedure as you would sending that TRANSACTION from the outside, just using a stored proc would not automagically solve the case, what need to change is the design of the task.

Everybody here with some knowledge in DBs knows that a MAX(ID) for just the objective of collecting the probable next ID is a bad and wrong design (it adds complexity, needs locking, it adds cost of table scanning). So as I said, I’m taking in consideration some special need from William.

As I already said, for the “next ID” he could just add the AUTO_INCREMENT clause to the ID field, but he said he had reasons for Max(ID).

Jay_Madren · April 11, 2022, 6:36pm

You’ve mentioned this a couple of times, but I do not see where William states any reason that requires using Max(ID) for the insert, only that he may need the value for other things:

This indicates the insert has already been performed and he’s needing the value for other purposes. Using an autoincrementing value in the table design would fix this also.