I’ve not been down this road before, so best practice/future planning advice will be very much appreciated…
I’m working on a multi-user app for data entry/editing, and the data is sent to the DB via a REST server and could be sent as single or multiple records. That is, the JSON would be an array of one or more items.
My question is when adding new data to the DB would it be better to created the record’s UUID in the client app before sending to the DB, or have the DB create the UUID and return it to the client after processing the record(s)?
I see a possible issue with DB generated UUIDs when sending multiple records in that the DB would need to return multiple UUIDs that the client app would then have to match up to the proper record. The UUID’s could be returned in the same order the records were sent in the batch, but that seems fragile to me.
I would have the server do it to avoid collisions.
1.Generate temporary UUID values on the client so that you can identify rows locally (maybe a number that counts down from -1).
2. Create real UUIDs on the server when inserting the records.
3. Return a temp UUID to real UUID list back to the client so that it can update its data.
In the upcoming April/May issue of xDev magazine my database column will focus on Triggers and orphans. I give an example of a trigger that creates a UUID for SQLite when a new record is inserted. Your mileage may vary depending on the database that you use.
i like this way starting by request to server and response for client.
insert with default values returning id
then edit this id old or new record and save data by request.
disadvantage: you have to remove unsaved data by user. no offline mode.
advantage: better program flow.
to prevent concurrent edit i would use a status user+timestamp.
I prefer to generate on the client because (in theory) it works in all cases including while the client is offline and needs to sync/replicate to the server later. To me, that is much of the point of using a UUID because in theory multiple clients can be offline and generate UUID values and there will not any collisions once they all sync to a server. And in the mean time local clients have a UUID to use as a primary key.
If you are ALWAYS connected to a server when the client creates records, then I can see it going either way but it just seems easiest to me to have the client generate. I still define the tables on the server database to have a default value of a new UUID, but in practice I always send the UUID as part of my payload to the server.
If you have collisions, then you really don’t have a UUID…
I think I’ll take the path of client-generated UUIDs and have the DB make one only if it needs to. The clients won’t necessarily always have a connection to the DB, so i think it’s best they create the UUIDs.
Let me elaborate it a bit. Internet is an awful place full of galaxy-sized minds, with endless creativity for these kind of things.
These are some scenarios that could (or could not) happen.
You’re an optimistic developer and you don’t check for uniqueness:
The hacker will send you not-so-unique UUIDs, crashing your service or building a corrupt database. This can happen just for fun, or because they want to flood your logs with errors to masquerade another activity, …
You’re a paranoid developer, you check for uniqueness:
Then you’re giving the hacker an opportunity for fishing stored UUIDs. Detecting this one is even more scary, as you know they’re targeting your service =(
That said, my team used client generated UUIDs in the past for a mobile application, in a far-far farm where connectivity wasn’t guaranteed.
We’ve used them as a temporary local IDs, so they could mix synced and un-synced events in the same list view. As soon as the device managed to have internet, the client could then send all these events referenced by a UUID. The server replied with an ACK on each of them, with their final stored ID, so the client could replace the temporary ID with the final one. The server didn’t check for uniqueness or tried to match or do anything with these temporary IDs, it was up to the client.
“So who hurt you, Ricardo?”. Now some campfire horror stories about apparently innocent strings I’ve personally suffered:
A high-traffic website with multiple cache layers to handle loads of traffic. The hacker saw an opportunity to bypass the cache, by sending arbitrary random string HTTP methods, as those verbs are just a suggestion. He managed to choke the server for hours, until we saw it.
An unhappy employee of another company, knowing there was an audit log page. He managed to build an XSS attack and steal super admin sessions by… wait for it… using malicious User-Agent headers, that were appearing in a web table, unescaped.
So yep, if somebody is in the fence and is asking if he should use a client-generated UUID for any happy reason… I’d say no, unless there is a really good reason.
A UUID is a 128bit number. It is jokingly said that there are more possible values than there are stars in the sky. The chance of collision is astronomically low. There’s no point to check for uniqueness because there’s only two possibilities:
The client generated a new UUID.
The client obtained an existing UUID from some place such as your API.
The client cannot guess the UUID of another record in the database. Of course, your API would check for permission to edit the record, but that’s something else entirely. With a properly generated v4 UUID, no two clients will ever accidentally or intentionally generate the same UUID for two records.
And your examples are all instances of data sanitization failures.
It’s not improbable. It’s as close to mathematically impossible there is. A 128bit number has 31 significant digits. It is a number so unfathomably large that humans can’t really comprehend what that means. Seriously, there are 340,282,366,920,938,463,463,374,607,431,768,211,455 different possibilities.
No there isn’t, because your API would be checking on permissions to edit the record anyway. If it isn’t, I would be way more concerned about this vulnerability, than I am about a UUID collision.
Noone would ever try to guess such an information. One would sniff information from network packets, another one would try to provoke leaks, overflows, etc. and so on.
A UUID is no failsafe for anything but overwriting previously stored information.
Sorry, but that’s just in theory. We don’t even know which UUID version will be used, nor the implementation. Even v4 UUID libraries had vulnerabilities in the past for not being “random enough”, and we can’t guarantee they won’t have more vulnerabilities in the future.
But even if we assume they are fine, you seem to be ignoring my point on purpose, which is giving the client the privilege of building those UUIDs, that I assumed (sorry if I’m wrong) will be used as a primary key in a DB.
I think this design opens the door for unnecessary attack vectors. My opinion is just this is a backend’s job, not a big big deal, not complain about UUIDs, all cool with physics laws, I’m okay with all those stars.
Of course, but for the third time, that’s what access controls are for. Your API would verify that the client has permission to update the record.
But you haven’t actually demonstrated how letting the client pick their own primary key is an attack vector. What’s the harm? Either they pick a new valid key, in which case all is well, or they pick a key that has already been used and they don’t have permission for, and your API rejects the request. Where is the attack vector?
So far all I see is paranoia for the sake of paranoia.