> First rule of database design: Your unique IDs should not have a real-world meaning.
Well, this is an opinion. I would definitely push back on making it a rule. Especially the first one.
Natural keys exist, make perfectly good unique identifiers, and have inherent meaning.
When using synthetic keys it is difficult (read: requires making trade-offs) to guarantee they were created in order and/or that they increase monotonically. But if you make those trade-offs (or just align expectations) you can still associate real world meaning to them.
These new UUIDs standardize more of those trade-offs.
> Natural keys exist, make perfectly good unique identifiers, and have inherent meaning.
Most of the "natural" things you think are going to be good Primary Keys aren't. You quickly get into one of those "Falsehoods Programmers Believe About X" lists.
I remember some years ago running into the Experian team who were designing a system to uniquely identify people and they'd got it all figured out for the US market. See, Americans have Social Security Numbers, so you can just use those to key off. Alas, as hopefully anybody here who actually deals with Social Security Numbers knows, they aren't unique. Perhaps you could argue they should be unique, but that's not useful in practice, because in practice they aren't. They seemed really disappointed about that. I doubt they fixed it though.
It's far more likely that the "definitely unique" Natural key for your table isn't actually unique than that there will be an insurmountable problem caused by using artificial row IDs, UUIDs or something like that.
SSN is a terrible key (and given privacy and security concerns should probably be encrypted or otherwise peotected in any case, not showing up in logs, developers notes while troubleshooting, etc).
Product serial number might be a better example. Or file hash.
> Product serial number might be a better example.
Nope. The "serial number" is just whatever you printed on the box or maybe on the product itself. You may intend that to be unique but you've got no way to ensure that. If you tell the people in manufacturing that if any duplicates occur it's a show stopper they will tell you to stop being stupid and design your IT systems to cope with the real world.
I buy serial-numbered LED lamps, the manufacturer replaces them if they drop dead within a certain number of years and I have some applications (18 hours per day of lighting) where that's plausible, so I need to track those serial numbers. Except they once shipped me a lamp (one in a box purchased together with others) with no serial number. Completely blank, no manufacturer, no type identifier, no voltage, the entire label was just missing from the product as shipped. Emailed them. No problem, apologies for the defect, your warranty registration for a lamp with no serial number will be fine.
> If you tell the people in manufacturing that if any duplicates occur it's a show stopper they will tell you to stop being stupid and design your IT systems to cope with the real world.
You've got a pretty low opinion of people in manufacturing there, buddy.
No manufacturer I've ever worked with would even blink at a requirement for unique serial numbers.
> No manufacturer I've ever worked with would even blink at a requirement for unique serial numbers.
It's easy and even expected to have a requirement. There's a requirement that US SSNs are unique too. A requirement doesn't cut it. The requirement is going to fail sooner or later and what then? How much did the manufacturers you worked with indemnify you against the consequences of non-unique serial numbers?
That's what we're getting at here. When it breaks somebody has to fix it. Or, you could design it not to break.
It's a man's laws versus nature's laws problem. Did you know you can pick up the soccer ball, and just carry it over the goal line? Yeah, that's "against the rules" but you can do it anyway, being against the rules doesn't make it impossible.
Now, contrariwise you can't teleport the ball - doesn't matter whether the rules allow it, can't do it anyway. Nature's law are facts. Design your database to rely on facts, not man's laws as much as possible.
So Ford builds two F-150s with serial number 1234. One has internal_db_id 1 and the other has internal_db_id 2. I buy an F-150 with serial number 1234 and immediately smash it into a tree. Total loss.
You buy the other F-150 sn 1234. You take your truck in for service. They refuse to service salvage title vehicles under warranty. You explain you don’t have a salvage title. They blame “the system”. Your insurance company fares no better.
Or, Ford could put a unique constraint on serial number and catch the problem in the factory.
It’s not clear to me how the internal_db_id did anything except propagate an error until it was way too late to fix.
The entire point of properly modeling data is to codify and enforce the “rules”.
> Nature's law are facts. Design your database to rely on facts, not man's laws as much as possible.
Exactly. Natural keys can, along with normalization and constraints, stop you from picking up the ball and walking into the goal. In other words, natural keys define the nature of your system.
> The requirement is going to fail sooner or later and what then?
First of all it'll get picked up by the people who set up the serial number printer. They don't just set it to a random number - the factory will have a procedure to make sure the number is set right.
Then it'll get picked up by the end-of-line testing, as test reports are filed by serial number.
Then it'll get picked up when the first part comes off the production line and gets checked by the factory's most conscientious testers and engineers.
Then it'll get picked up at goods inbound, when they count the items and check them in, and find the system says some of them are used, or there's a discrepancy in the number of items.
Then it'll get picked up at acceptance testing, when they find the shipment's test records aren't in order.
Then it'll get picked up by accounts payable, when the manufacturer bills for parts they've already been paid for.
Then depending on the numbers and costs involved, you either send the bad units back for relabelling or trash them, in either case at the manufacturer's expense.
> First of all it'll get picked up by the people who set up the serial number printer. They don't just set it to a random number - the factory will have a procedure to make sure the number is set right.
Printer ran out of toner and nobody noticed until the next shift.
> Then it'll get picked up by the end-of-line testing, as test reports are filed by serial number.
Nah, test report operator was out sick that day and product needed to be pushed anyway. Manager assumed they're all good and overrode tests.
> Then it'll get picked up when the first part comes off the production line and gets checked by the factory's most conscientious testers and engineers.
Those testers and engineers don't have all day to test every single product. They only tested the first 100.
> Then it'll get picked up at goods inbound, when they count the items and check them in, and find the system says some of them are used, or there's a discrepancy in the number of items.
Counted by total weight divided by individual weight.
> Then it'll get picked up at acceptance testing, when they find the shipment's test records aren't in order.
Counted total number of UPCs.
> Then it'll get picked up by accounts payable, when the manufacturer bills for parts they've already been paid for.
Accounts payable sees a box, of 1000 parts and they're labeled and the top 100 appear genuine... pay it.
> Then depending on the numbers and costs involved, you either send the bad units back for relabelling or trash them, in either case at the manufacturer's expense.
Manufacturer thinks you're trying to defraud them and won't authorize a return without a serial number.
I have, in fact, worked for multiple manufacturers of serialized products.
My point is that even though there are multiple points where the serial number "should" be checked, there are also multiple points where that could fail for one reason or another.
Agreed. The simplest example is names. People don't have unique names so you combine it with some other stuff (birthdate, SSN) and call it a primary key. Then someone gets married and changes their name. To anyone doing this for a short amount of time this example is obvious. The point is that almost everything you might think of as a "natural key" is not good enough and you'll get burned when you find out what the reasons are.
Well, this is an opinion. I would definitely push back on making it a rule. Especially the first one.
Natural keys exist, make perfectly good unique identifiers, and have inherent meaning.
When using synthetic keys it is difficult (read: requires making trade-offs) to guarantee they were created in order and/or that they increase monotonically. But if you make those trade-offs (or just align expectations) you can still associate real world meaning to them.
These new UUIDs standardize more of those trade-offs.