Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Database constraints: The last line of defense (citusdata.com)
87 points by anarazel on April 6, 2018 | hide | past | favorite | 101 comments


Every single time I've seen someone say something like, "We don't need constraints; we have validations", they've had to clean up their data — when it turned out validations aren't enough.

Please do future-you a favor: use constraints.


A million times this.

Cleaning up bad data after the fact can actually be impossible. When it is possible, it can be annoying, tedious, and easy to fuck up.

All of this can be avoided by an astonishingly small amount of forward thinking. What columns should not be nullable? What are the maximum sizes needed for strings, numbers, and binary data? Which columns or sets of columns should only ever be inserted once? What columns point to other rows, should they be required, and what should happen if one side or the other has deletion attempted?

Constraints should cover basic data consistency guarantees. Validations are fine for business logic and business requirements, since those might change more frequently.


You start to miss constraints when you use a database system which has none.

I used Postgres to store various sensor data (some Raspberry, ESP and other "mini" stuff). Then tried InfluxDB, mainly because Grafana supported it, while Postgres did not (it does now!).

Now there are sensor malfunctions, due to old age, no proper connection, bugs or whatever. A Temperature sensor which shows 85 °C while only being capable of 50 °C? You could capture that in the application, but this is cumbersome. Just do it in the database, constraint "temperature < 50" and you are done.

To top it off, deleting wrong data in InfluxDB is a real pain.


Thanks for the grafana shoutout. Glad you find the Postgres support in it useful.

Out of curiosity, re: constraints in your example couldn't you easily do something like removeAboveValue(50) when retrieving your data from these faulty sensors? It seems like basic constraints, windowing, etc should be part and parcel of most time series databases?


Did you move back to Postgres? What are some things you did to bring some of the things you needed from Influx into PG (like rolling aggregations)?


No. At least not yet. Rolling aggregations are broken in InfluxDB aswell.


I would never use constraints ever again. Period. Billions of rows of data to POCs. It's interesting to note that AMZ data largely runs without constraints (the largest systems). Working in Southern California, where tenures run a few years before you end up somewhere else, everyone comes to a consensus in a decade. It's obvious to me that embedding business logic in SQL along with the server app and whatever presents data to third parties, is just making everything harder. Then you run into the inevitable of data changes and fighting the existing constraints. It's a waste of effort, but an obvious path-to-job-security for DBs.

> And of course you can get quite clever and fun here....create a function that checks if a function is a fibonacci number

The fact this is presented as a reasonable (trivial!) thing to embed in the DB screams, this is a terrible place to work.


Sorry but to put it bluntly, this is a very ridiculous thing for you to stay. I tell everyone, it's easy to remove constraints, but it's very difficult to add it once your data has been polluted. Think of constraints like a canary in the mine for your data. If things are wrong or perhaps have changed, it alerts you. If you can expand it, you do so, if you have no choice, you can drop it. But always begin with constraints!


A lot of things that are really, really good ideas for most applications might not apply at Amazon scale. Most of us don't work at Amazon scale.

I agree on the fib constraint, though. Fun, but nothing you'd ever really do.

> embedding business logic in SQL along with the server app and whatever presents data to third parties, is just making everything harder

Not in many cases. In the past 15-20 years, think about how many application stacks have come and gone. We've had Postgres that whole time, and plsql. Core business logic in the DB never needs to be rewritten, no matter how many different front-end rewrites you go through.


With your last point, you're conflating language longevity with code longevity. Code in the DB changes as often as any other code: as often as the business requires it. The last company I worked for who used stored procedures had a change for those stored procedures with every frontend release.

And logic in the DB is inherently capable of only scaling as large a your DB cluster (i.e. not very), and can cause your DB cluster to scale prematurely (an event which incurs a lot of potential technical debt and additional monitoring).


I don't understand why. I see constraints doing the same job as objects to in programming: they keep the data and the rules for manipulating that data together, ensuring that the invariants are always respected.

If "you can't have a price <= 0" is a business rule then it should be enforced in all layers: in the UI because you want to warn the user they're trying to do something incorrect without sending obviously invalid data to the server, in the back-end because you don't want to write incorrect data to the database, and in the database because you don't know that someone else will not write "just a quick app" that will bypass your back-end and mess up the data.


Problem with constraints is that they reflect current business logic, but database stores past data. This is inherent mismatch and it'll cause conflicts. You can't add constraint which conflict with old data, yet you need to check that constraint for current data, therefore you must implement those constraints somewhere else and if you have those constraints there anyway, why bother with additional ones. If you have duplicate constraints, you must not forget to keep them in sync. I'm not saying that constraints are bad, database without constraints would be a very strange beast, even UTF-8 text is a constraint upon arbitrary byte strings, but it's a dangerous construct and must not be misused, otherwise it'll cause more harm than good.


> You can't add constraint which conflict with old data

Of course you can. For example, constraints inside triggers. A simple insert trigger will only apply on records going forward.

Also, the reason why you bother with "additional" constraints by implementing them on the database is because the database is typically the thing that can enforce them. Unless you go out of your way to roll your own single-write via queue or something.

I'm not sure I agree with your take-away since you can just drop constraints. Easy come, easy go. Meanwhile bad data is easy come and potentially a nightmare to fix.


>Of course you can. For example, constraints inside triggers. A simple insert trigger will only apply on records going forward.

Or you can just expand the constraint to superset of two, or figure out how the old data needs to be treated by new system and migrate it. There are many right options if you actually care about data integrity that are not as easy as not caring. Doesn't make it good general advice not to care.


Saying a feature of database that validates the data inside a database is dangerous is very much like saying seatbelts are dangerous because you can drawn. Anything can be misused, but there is a clear use case, because an ACID database provides a whole transactional, reliable and coherent application that can enforce rules no other layer in the system can, unless it reimplements the entire DB with sql validation and type checking.


> You can't add constraint which conflict with old data, yet you need to check that constraint for current data,

If there was a way to distinguish "old" and "new" data, you could write your constraint to honor the difference.

> therefore you must implement those constraints somewhere else and if you have those constraints there anyway, why bother with additional ones.

IMHO, if you must only have one set of constraints, put it in the data store.

The thing about constraints is that they provide pretty solid guarantees that let you better reason about the data at its base level.

You can't reason well about data that was "checked" by dozens of different versions of validations over time. What you have in that case is garbage in need of cleanup.


> I would never use constraints ever again

So you'll never again use a uniqueness or foreign key constraint?

Also, I don't know what pocmeans, bit it provides no insight into the problems you had and why it was ok to accept otherwise bad data.

> The fact this is presented as a reasonable (trivial!) thing to embed in the DB screams, this is a terrible place to work.

Yes, a cute example showing you that you can constrain by anything is indicative of how terrible a place this is! /s


Do you rely on file system permissions as a sanity check to your application code being correct? (If you don’t you probably are in the wrong job)

A filesystem is just a hierarchical key-value database.

Assuming you do believe in file permissions, why wouldn’t you use the additional sanity of validation available in a SQL database?


What, you'd rather drop data on the floor? Your scientists would prefer you keep all measurements, even the bad ones. Sometimes those "bad" data actually have information.

Why not have two databases, one with all data without constraints and another, or even many which impose various constraints?


I don't understand the distinction you're making.

Bad data in the context of database constraints is data that violates your data model which, in the absence of constraints, is implicit and tucked away around the application layer in comments and functions and word of mouth. If you're cleaning up data, that means your data integrity has been compromised.

If you're collecting unstructured data from scientists, there's no contradiction. Maybe you make a nullable byte blob column for it. That's generally the most nonthreatening data in the database. Data constraints don't mean you need perfect data.

But the fact that your mental image is the weirdly specific scenario someone deleting data while "their scientists" go "aw snap, I wanted that" indicates a misunderstanding on your part about what's being discussed.

It sounds like you think that the opposite of data constraints is a windfall of valuable data that someone wants. No, exactly the opposite is what you're protecting yourself from with constraints.


Two different tasks can have two different levels of "integrity" desired. For example, the task of mailing a thank-you note to all customers might not care if one customer accidentally has two addresses or none. For the task of billing customers, we probably want an exact one-to-one customer to mailing address.

If someone protects me with the constraint of one-to-one customer to address, then maybe I lose the fact that the customer has just opened a second office.

All I'm saying is that database constraints go both ways, they reveal bugs and they cause bugs.


Bad data of this for is almost always the result of bugs, not of real data failing to conform to your modeling.

In this case, yes, you should absolutely be dropping it on the floor (and alerting the devs with an exception) rather than creating a mess that needs cleaning up later.


How would you know if you'd made a false negative if you drop all "bad" data? It's more of a mess when you realize you've accidentally been losing records for the last few years.


The data is not dropped silently but very loudly, immediately. Unless of course you are using Mysql, well then...


So you stop the program immediately and allow no more transactions while you debug? Unlikely.

Keep the data. Log that it's weird.


What is your software development background?

I ask because it would not have once occurred to me in the past fifteen years of software development that “enforcing database constraints” in any way implies shutting everything down and stopping transactions until the cause of a violation is rooted out.

If you’re approaching it from the position of collecting raw data from sensors or something, that might make sense. But from the perspective of a webapp, allowing duplicate (or empty) usernames or failing to keep foreign key relations valid often violate basic assumptions of soundness that the application is based upon. In these cases, the right answer is to drop the data on the floor (loudly) so a developer can look at it. Inconveniencing a customer once here is infinitely better than inconveniencing them constantly due to the site being buggy or broken as a result of the invalid data, and prevents needing to migrate the data in the future which can actually be impossible to do correctly in some cases.


> background

Web dev for some years, then data science.

> implies shutting everything down

That's what "loudly" implied to me. Loud means crash, no?

> basic assumptions of soundness

I like to enforce that at the application layer, so that I can quickly swap storage without worrying. Whether I'm writing to database or a mock object, I'd hope the exceptions that arise in the application are the same. Things like unique index are easy to implement with a mock object, but more complex constraints I'd rather enforce in code that I find easier to read.

> buggy or broken

All I'm saying is not to be an absolutist. Some things are good to have as database constraints, others are good as application logic. I'm arguing against the idea that "validations are [never] enough." In contrast, database constraints are almost never enough to manage the complexity of data integrity.


App should not crash but return/log the database error, ideally translated to layman terms on the customer end.

Logs sent high priority to dev team for fixing.


Indeed, I agree.


> So you stop the program immediately and allow no more transactions while you debug? Unlikely.

You stop the transaction on a constraint violation, not usually the entire system.

> Keep the data. Log that it's weird.

I've worked on systems that do that, and they're nightmares.

The problem with that style can be summed up in "if it doesn't look broke, people don't bother fixing it." No one actually watches those logs, and users don't try to fix data that was "accepted" with no errors. The result is bad data piles up, and often causes problems in other parts of the system.

Maybe you (as a developer) love working on support tickets to manually fix bad data the users entered or a buggy process created, but I sure as hell don't.

The solution to this is to fail fast: https://en.wikipedia.org/wiki/Fail-fast.


I think our practices are closer than you realize. Maybe I didn't explain myself well.

The idea that troubles me is including more strict constraints in the database than in the application itself, because that indicates the application isn't fully tested. If the application is well tested, then it doesn't matter where the logic resides, db or app, so there's no standing to the claim that application logic is never enough.

Also, there are many kinds of data. Transactions, sure, reject if it doesn't match. Telemetry data, I'd like to hang on to it regardless of how weird it seems.


> Your scientists would prefer you keep all measurements, even the bad ones. Sometimes those "bad" data actually have information.

A constraint can allow "bad" data if the "bad" data is still semantically valid for that column.

A freeform text column where the scientist writes notes might not need any constraints. But a column used to send measurements to a robot to manufacture something, must have some protections around what type of data it can store. And no, the scientist doesn't get to input "bad" data that makes the robot lose its mind.

Sometimes yes, you drop data on the floor.


I've seen it go the other way plenty of times -- the dev thinks a system should only receive numbers, but occasionally a string is valid as well.


Then the data model was wrong and they migrate their scheme appropriately. I don't think you understand that this is a core part of why you want a strict database: so you can perceive mismatches between your assumptions and reality and then respond precisely.

Do you think it's superior to be in a situation where you thought all of your IDs were numbers, you built your system with that expectation, and it turns out strings have been silently making their way into your data?

The difference between the two scenarios is that the person that used constraints knows when and why and where they got unexpected data and they can take more decisive, informed action. Meanwhile the other guy is desperately writing queries to find unexpected data, shooting around in the dark, and wondering what else slipped through.


Unfortunately he sounds like about half the Node developers I work with. Nobody wants to make concrete assertions. To decide anything. Usually this count field contains a number, but sometimes...

It’s a miracle our stuff works as well as it does. A more patient man could probably create a good deal of programming literature trying to unpack that paradox. Instead I’m hunting for something where people are a little less concerned about job security and a little more open to mentoring, robustness, and low drama development processes.


You want openness to being mentored, but you're rejecting the possibility you're incorrect?


The possibility that the big ball of mud people are right? That hero worshipping is better than team building? That the people who still write software the way it was done 25 years ago know something that the rest of us don’t? That all of the techniques and tools I pressed my face against the glass for 12 years ago that are now considered de rigeur are bad? That hoping for the other half of things I wish for will happen in the next 12 is a pipe dream?

No, I’m not open to that kind of education. I’ve had plenty of volunteers to teach me if I ever change my mind. In the meantime I’ll try to work in the half (quarter?) of the industry that strives for something better.

As someone once told me, I don’t need to be a good fit for every job. I just need to be a good fit for one at a time.


Not sure where all those rhetorical questions came from. They seem like non-sequiturs.


My point is that there is only so much you can learn from someone exhibiting a raft of bad actor behaviors.

Life is short. Find better mentors instead of trying to eek out a little wisdom from bad ones.


How do you know who is a "bad one"? If you discount the "little wisdom" of people you disagree with, you're creating a a confirmation bias problem.

A good scientist tries to find evidence that contradicts her theory. Evidence that is consistent with the theory proves nothing.


I already told you and you called them non sequiturs.


If the data model was wrong and we must migrate, then I hope we were storing all the records that were falsely labeled incorrect.

It's a constant fight between analyst and application developer. The analyst wants a history of all states. The application developer often only cares about having a correct current state.


> It's a constant fight between analyst and application developer. The analyst wants a history of all states. The application developer often only cares about having a correct current state.

Again, this is totally and completely wrong. Actually, it doesn't even make sense, to say it's wrong. If I'm building an application to be used by an analyst, I want to hit all their requirements. They tell me what sort of data they want to be collected. If requirements change, or the data isn't as valuable as originally thought, or it's badly formatted, or whatever, then we work together to change it.

I don't fight with anyone.... They don't want shit data, and I don't want to store shit data. So who is there to fight with?


The case I'm considering isn't an application to be used by an analyst, but the analysis of an application used for some other purpose. It's often the case that an application only stores the current state. When there's historical data, it's often snapshots at regular intervals instead of an event log.

Anyway, we're going way down a rabbit hole. All I wanted to say is that if the user, sensor, or application logic emits something weird, sometimes it's correct.


And all everyone else is saying, constraint to all possible weird values and reject the junk. If you want free form text field, use that. Problem solved. Meanwhile numeric column is numeric only.


I didn't hear it like that from the early comments, but instead encouragement to duplicate more complex logic from the application as database constraints.


Can all the constrains of your data really be expressed via postrgres constraints? Should they be? What is the balance?


Exactly. Validations are a really good tool for validating input and displaying validation issues to users. They are not a great tool for ensuring that relational data stays valid forever.


It seems the constraints are mainly useful while developing. You probably never should hit constraints in a production system.


With concurrency it's certainly reasonable to hit constraints in production. Another session deletes something, another inserts a row that attempt to reference the deleted row. Or you have multiple users registering with the same account name at the same time. Etc.


That makes sense. I guess most of the stuff I do doesn't do much concurrent database access.


You might think that, but the most common cause of constraint violation in the web-programming world is unintentional user double clicks setting up a race between simultaneous requests.


Cause they get frustrated when your app isn't fast or responsive enough.


You would be surprised to see how many (often older) people double click everything by default. Explorer folders, desktop shortcuts, web forms, it is all the same to them.

I’ve run a number of event sites for luxury car brands and it was stunning how many double-click form submissions we received on those. Often just milliseconds apart.


It's not only old.people who do the double click thing. I see the same with a lot of our sales reps when I watch them use our stuff.


The mouse I use at work turns a large percentage of my single clicks into double clicks. There are lots of reasons people might double-click other than frustration.


Constraints are useful when people are trying to manipulate the data outside the bounds of your application. Think ETL, “fixing” data, etc.


Also plain bugs. If your database changes less rapidly than the client code it’s a great way to avoid simple mistakes taking a ton of work to repair. This is especially important if you have a large team and your best developers don’t work on the “boring” parts.


Even if there is perfect enforcement of constraint criteria at deployment, code has been known to be changed after deployment. Sometimes.


Nah, have them at every level. Javascript, server side code, then business layer code, then data layer (why not, we're doing in the DB as well), then in the db itself.

Laughable.


Why laughable? Different levels can support different amounts of context in logging and error handing and ultimately the message the end user sees.


Why is defense in depth a laughable concept?


What's laughable here?


Check constraints and foreign key constraints will save your bacon.

I don't like triggers though, as a rule. Not that they are always the wrong thing, but its easy to kill performance with them, and just generally having "behind the scenes" side-effects triggered by other operations can be surprising, to say the least.


You might want to look at [1]; you can send a message to a queue and process that independently. It breaks transactional boundaries though (if you abort the transaction the message is still in the queue). Using transactional queues will probably not help because then the transaction has to wait for the message to be processed so you're back to square one.

[1] http://www.madeiradata.com/service-broker-asynchronous-trigg...


These are good, I've forgotten about some of them as well. As to:

  EXCLUDE USING gist (
   id WITH =,
   period WITH &&
  );"
What the heck?


It's so you can't define two billings for the same period (say, the same month). To prevent you from double-billing someone.

&& is the range operator that checks for overlap.[1]

You have to use gist indexes to do this. And in fact this page doesn't mention it but you have to enable the btree_gist extension with:

    CREATE EXTENSION btree_gist;
before the snippet above will work.

[1]: https://www.postgresql.org/docs/current/static/functions-ran...


They're very nifty, as you can use this to easily prevent all kinds of date range overlaps.


"Want to ensure tweets aren’t longer than 140 characters, make sure you set your tweet column to varchar(140) and not to text."

This is fine if all your users are content with ASCII. As soon as double-width characters from Chinese or emoji start interfering, the database will limit some users to 70 characters. It's ok to make restrictions, but be sure to fully understand how they'll affect your users.


In PostgreSQL varchar uses actual characters (as specified by the database character encoding), not bytes, IIRC. The same thing should be true for (at least recent versions of) MySQL.


Errors in memory are temporary, errors on disk are permanent. Don't approach them both the same way.


One thing I like about SQL constraints is that they're so much shorter than in a procedural language.

Has anyone developed an elegant system for bubbling them up to the user interface? It seems you either have to write the same constraints again in JavaScript or build some massive list that maps ugly database errors to friendly user errors.


C# MVC has an attempt at creating some validation and automagically promoting that to the UI but it's (of course) limited to stuff like "this field is numeric" or "this field is required". Anything more complex needs duplication.

I would assume that something like node.js or meteor have it easier, since they're using the same language in both layers; I believe there are also attempts at tools that automatically translate other languages to JS but I haven't used any of them.


Yup. I still remember the day I found out about constraints, and I became kind of angry nobody had told me about this before.

I can imagine one can get carried away and shoot themselves in the foot when being careless, but all in all, I think constraints are a valuable tool to protect the integrity of my data.


Once I was building a small application, I didn't want to use a whole database such as postgres that is amazing but does require some maintenance, so I settled for using just Redis.

It was fine but the lack of constrains and structure was killing me making everything harder, but I liked the zero maintenance things...

So I build RediSQL[1] a module to provide SQL capabilities to Redis to get the best of both worlds: stability, easy maintenance, velocity, constrains and structure in data.

[1]: https://github.com/RedBeardLab/rediSQL


Does MySQL support CHECK constraints yet? Last time I used it it silently ignored them if they were in the DDL.


Good heavens, then why even have them?


Everytime I deal with mysql I ask why there is mysql. It's terrible when compared to postgres at consistency, correctness, functionality (e.g. check constraints, cte, and window functions, pg_tgrm, full text search, index flexibility and that's without thinking about PostGIS, pg-routing and many of the other excellent 3rd party extensions) and error handling and nearly equivalent when it comes for basic queries that mysql even supports.

Edit: forgot transactional ddls, real ones, not the ones mysql has in the pipeline.


The industry seems to finally be warming up to Postgres and I find that really exciting. I think MySQL popularity had a lot to do with it's willingness to say "yes". It's really easy to ship something on a database that never warns you or refuses to do the wrong thing.

Postgres is "harder" to use because when you make a dumb mistake it tells you about it.

This post is an oversimplification but summarizes my general impression.


Mysql has better replication (at this point). Postgres does seem to be catching up in that area.


Better by what metric, and is it worth the hardship in every other area?


I think mysql supports multimaster, and other replication formats. Postgres just supports master and slave.


Not true. https://wiki.postgresql.org/wiki/Replication,_Clustering,_an... postgres has a variety of ways to do replication, including multimaster (mainly asynchronous). Core is also getting improvements that will make this easier to do without extensions, mainly through the infrastructure pglogicql brings.

Until 5.7 I believe mysql required extensions to do multi master as well.


BDR is available in the open (aka for free :D) only up to PostgreSQL 9.4 though, and that's really the only somewhat-equivalent solution to MySQL multimaster IIRC.


Also not true.

9.6 is supported and the next release (scheduled soon) will support 10. Those changes will eventually end up in 12.

https://www.2ndquadrant.com/en/resources/bdr/ https://blog.2ndquadrant.com/news-and-roadmap-for-bdr-multi-...

I'm not sure what you mean by somewhat equivalent. Both are primarily asynchronous multimaster approaches.


> (NEW! Postgres-BDR 2.0 on Server Version 9.6 Now Available!) 2ndQuadrant is proud to announce the availability of BDR 2.0 running as an extension to community PostgreSQL 9.6 for its support customers. You can now make full use of the additional features offered with PostgreSQL 9.6 while using Multi-Master Replication from BDR, fully backed by 2ndQuadrant’s world renowned 24/7 Production Support. Fill out the contact form below to get in touch with us.

This suggests to me it's not really open to the public. And there is no release 2.0 on github (https://github.com/2ndQuadrant/bdr/releases).

Feel free to correct me if I'm wrong and the 2.0 release is accessible without a 2ndquadrant support contract (or reverse engineering the changes from the github repo branch, which pretty much ensures you're never going to production with it), I'm genuinely interested.


Half of those items are already in MySQL 5.7 or 5.8.

What's terrible to you is a plus for Web 2.0 data storage - not everybody wants B&D just to insert a row or to select a timestamp.

I happen to use both databases, but I recommend MySQL first for startups.


Your recommendation is terrible. Mysql has data consistency issues. It's simply terrible to work with. It doesn't have real data types or complex indecies. The window functions they're adding aren't all that comprehensive. Window functions and ctes are in a pre release version, compared to what? Nearly a decade of production use? There are so many fewer gotchyas with mysql it's not even funny.

Just stupid things like "This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'".

The query planner is crap and I've actually had it hang the instance.

The explain output is barely usable compared to postgres'.

Postgres supports a rich set of languages to write functions in.

Postgres supports rich set of column, row, and table access control.

Postgres supports returning syntax on updates.

As far as I know check constraints are still _silently_ ignored in mysql 5.8. silently! That illustrates exactly why I don't trust mysql and probably never will.

What is one advantage that mysql has that outweighs its enormous pile of garbage it brings along? What is one advantage worth not trusting your tool to work?


Many of the things you've listed are not important for a wide class of projects that MySQL is used for. Facebook, Google, and Yahoo must be fools for using MySQL - clearly don't know what they're doing /s.

> What is one advantage that mysql has that outweighs

Replication & performance for most workloads. I've found it helpful to think of MySQL as less of a relational database, and more of a KV database with secondary index support and a SQL api.


> Replication & performance for most workloads.

I'm still not sold that any slight advantage in either of these cases is worth not having a flexible and trustworthy tool.

> I've found it helpful to think of MySQL as less of a relational database, and more of a KV database with secondary index support and a SQL api.

And that's not what most businesses need, especially right out of the gate.

From my understanding that's what those large companies you mentioned and Uber do. They also have people who contribute code to mysql or mariadb, allowing them to fix any issues they have. Most businesses do not have that luxury nor would they be best off with only a kv store.


> not having a flexible and trustworthy tool.

Everything you've pointed out are great examples of technical demerits... but none that really transfer over in any negative way to the business besides developer frustration. I wouldn't be surprised if MySQL got replication out so earl because it was such a POS that needed a better backup strategy, but that doesn't change the battle-hardened RDBMS it's become today.

> And that's not what most businesses need

Most businesses don't know what they need until they're a few years in, and database software isn't going to be on any execs top 100 list of things troubling the company.

> They also have people who contribute code to mysql or mariadb

Contributions that are from non-trivial deployments, supplemented with countless years running in production across millions of installs of Wordpress and phpBB thanks to cheap web hosting - I don't think there's any other database in this category that can claim it has seen more service.

It's just another software tool like any other, sorta pointless to shit on it just because you don't have a problem where it works.


> Everything you've pointed out are great examples of technical demerits... but none that really transfer over in any negative way to the business besides developer frustration.

> Most businesses don't know what they need until they're a few years in, and database software isn't going to be on any execs top 100 list of things troubling the company.

A database is a developer tool, it should be evaluated on its technical merits. If its frustrating for developers, it means its harder and more expensive for the business to get what it wants.


> A database is a developer tool, it should be evaluated on its technical merits. If its frustrating for developers

And MySQL is such a developer tool, which has been selected by many developers for many reasons for many classes of projects.


So is PHP.


I'm not going to continue this. MySQL is technically inferior by almost every metric. Businesses don't know what they need, which is why they need something flexible and reliable, not something that will crap out.

I honestly don't know why anything but institutional inertia is even a sane reason to use MySQL. There are better rdbms. There are better key value stores.


> I honestly don't know why anything but institutional inertia is even a sane reason to use MySQL

Because you clearly lack experience to make this kind of call, but that doesn't stop you from throwing around technical rhetoric like you know what you're talking about. You're essentially claiming that everyone that has or will choose MySQL is an idiot/doesn't know better, and that clearly you know more about the context of our problem, business, and challenges than we do.


> You're essentially claiming that everyone that has or will choose MySQL is an idiot/doesn't know better...

Why is that surprising? When I worked in a business to business compsec role, for instance, the vast majority of our customers didn't know how to use `ssh` properly or what public and private keys are. Just because someone has the title "dba" let alone "developer" doesn't mean they know the first thing about the relational model.


I figured it was so MySQL wouldn't barf so badly on DDL written for other databases. That's the only justification I can think of, and it's a pretty terrible one.

I have a pretty low opinion of it because of this.


It already barfs on a fair amount of valid SQL. Autoincrement vs auto_increment anybody?


Actually there isn't* a standard for auto increment columns.

* There is one for an identity column, which often ends up being similar `GENERATED BY DEFAULT AS IDENTITY` https://blog.2ndquadrant.com/postgresql-10-identity-columns/ https://dba.stackexchange.com/a/161356 and I don't believe mysql supports that syntax https://dev.mysql.com/doc/refman/5.7/en/create-table-generat...


No.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: