Why do you think placing business logic into the database system is a layering violation?
Just because most developers use their DB as a dumb store doesn't mean it needs to be. There are also plenty of successful software systems that place the majority of their business logic and use a generic programming language and runtime only for the presentation layer.
If you're comfortable fully exploiting the capabilities of your DB, then the intelligent combination of a relational model with custom data types, constraints, triggers, views and stored procedures can make the DB the perfect place to implement business logic.
I love what you're saying, but running a normal modern development and deployment cycle on code stored in databases is hell.
Until db vendors start taking developer happiness seriously, stored procedures and triggers are a total non starter for any serious work.
I mean, in most databases you can't even rename a column without causing an enormous blocking migration. What? Why can't this happen in the background? Why can't I have column aliases? The very basics of developer happiness aren't covered, let alone the harder bits, like versioning stored procedures, switching schema and code when switching to a new git branch, and so on.
(EDIT: of course there are open source tools that help with all of the above, but they're all swimming upstream, fighting a database that simply can't imagine change, and are usually terribly leaky abstractions as a result)
> I love what you're saying, but running a normal modern development and deployment cycle on code stored in databases is hell.
Can you be more specific? What exactly are you missing?
As long as you put your code and data into separate schemes and follow good technical practices, it shouldn't be too different from other technologies.
> I mean, in most databases you can't even rename a column without causing an enormous blocking migration.
Changing a column name is just a metadata change, so it shouldn't take too long in Postgres.
> The very basics of developer happiness aren't covered, let alone the harder bits, like versioning stored procedures, switching schema and code when switching to a new git branch, and so on.
Does Tomcat or any other application server version your WAR files for you or does it manage git for you?
Also in my experience DB Schemas often (but not always) outlast logic implemented in applications. The more the business assumptions reside in the database, the easier it becomes to rewrite applications on top of it later.
Declarative programming expresses the logic of a computation without describing its control flow.
This new feature in PostgreSQL is a great example of that: generated columns (declarative logic) were introduced to reduce the need for triggers (imperative logic).
In SQL, declarative logic consists of constraints, indexes, views, and prepared statements. They can significantly increase the efficiency and reliability of the entire system. Imperative logic is mostly triggers and stored procedures, both of which can become hard to maintain and scale.
Clearly the feature should not be abused, but calculated columns are great for stuff that's obvious and always true (i.e. independent from applications), e.g. a trade value computed from a trade unit price and a trade quantity.
This way I can select the top N trades for a given key without having to do the computation in the application, or storing redundant information in the DB.
Not postgresql but I saw a start date and an age in days columns in MS SQL server. The age gets updated daily. It didn't sound right to me. I'm pretty sure I'd fail my database class in college if I did that. What is different in real life and why didn't they teach me this in college?
I obviously can't answer for this particular case, but my first thought on why I would consider doing such a thing is if I had an app in which number of days old was something that had to be queried, displayed, and/or used in other functions/queries a massive number of times per day in the course of normal application usage. If the application had low usage, or number of days old was infrequently queried/displayed/used, I wouldn't consider it. The moment I found that significant time & resources were spent calculating the value in normal/regular usage, I'd start looking at ways to reduce that time & resource usage. How to go about it varies, but the win of that value being immediately available without computation could mean a lot to an app/business and its users.
There's no clear separation in databases between data definition and application logic. And this separation is very useful. You can change application logic very easily. Just stop old application and start new. You can use multiple application instances to balance load in many cases. You can often rollback bad application update. There's absolutely no problem to use miriads of development tools from Git to CI systems.
Changing database schema is a big deal. It might take a lot of time or it must be done with great caution to keep database online. It's hard to properly version it and it's often hard or just impossible to roll back bad update.
Generally database is state and application is stateless. You can couple it, but decoupling works better.
Incorrect separation of data from logic is a human problem, not a Postgres problem. Put your logic in one schema[1], let's call it code schema and your data in another schema, the data schema. There you have your separation. Now you can:
* Change your application logic very easily.
* Use a transaction to deploy new code! Zero downtime! [2]
* Use read replicas to balance load in many cases.
* Rollback bad application updates
* Test anything [3]
* Use miriads of development tools from Git to CI systems.
* Use row level security, so that every user can only see his own data [4]
* Only allow applications to call your code schema, never let them touch your data directly.
You probably wouldn't use multiple tables or multiple columns either and just have a single table that stores document-like rows... which vaguely reminds me of something.
Foreign key constraints are not business logic, they're part of a sound and sane database architecture (just like basic indices on heavily-queried columns). For an overwhelming majority of use cases, tables should have them; one needs a very good reason why a table shouldn't have them.
The last time I tried to do that, I struggled a lot with error handling. The errors your database give you for schema violations aren’t really user friendly, so I have to convert them to proper errors (you must not enter negative amounts). When using SQLite there didn’t seem a way to know what column cast the error without parsing the string. That often lead that I had to implement the business logic twice. One check for in the code for the error handling and another one in the database. That was also often quite inefficient and prone to race conditions (if using improper isolation). For example if the table has two unique fields, I cannot just check for a unique constraint violation because it doesn’t tell me the column in the error data structure.
Looking at the Postgres driver it seems easier but is there any good tutorial on how to do error handling properly for databases?
All sorts of things are possible but it misses a fantastic opportunity to compartmentalise the data away from the implementation. If it doesn't make sense to compartmentalise data and logic, why compartmentalise anywhere? Do the whole project in one big file. Of all the surprises a project is going to face 'oh, this data is useful for [new thing]' is one of the most likely. And everyone expects to find a boundary drawn there because it is such an obvious place to draw one; so it saves on confusion.
Putting complex business logic in the database is opening up all sorts of interesting new ways for data to be unavailable, corrupted or complicated to access. It is easy to imagine it working out well for simple requirements where there just needs to be something that works.
PostgreSQL is a piece of software that takes data and enforces the relational data model on it. Great idea. But the relational model of data is really only tuned to relational algebra. Put complex logic in there and all that is really being accomplished is now you can't migrate away from PostgreSQL. Relational databases already have great integration with every other programming language in current use.
> All sorts of things are possible but it misses a fantastic opportunity to compartmentalise the data away from the implementation.
That's what schemas are for. You have a schema for your code and a schema for your data. You can redeploy the code scheme independently of the data scheme and set up permissions so that higher layers can only use objects from the code scheme and never touch the data.
> Put complex logic in there and all that is really being accomplished is now you can't migrate away from PostgreSQL.
Say you wrote your code in PHP, now you want to migrate to Ruby or NodeJS. You can't, you have to rewrite everything. How often do you plan to migrate to another database? In my experience this almost never happens in reality, but the layers above come and go.
> You can redeploy the code scheme independently of the data scheme and set up permissions so that higher layers can only use objects from the code scheme and never touch the data.
That all just sounds a touch complicated compared to data in database, code somewhere else (like git). I'm circling back to the same point in a couple of different ways, but pgSQL specialises in the relational model of data. That isn't a great data model for code and there are already better ways to manage code than shoehorning it into the database. Its cool that it is possible, and I'm not saying that someone who does is making a mistake. But I also don't think they are gaining an advantage and there is a really easy opportunity to separate out where bugs can occur from where data lives.
> How often do you plan to migrate to another database? In my experience this almost never happens in reality, but the layers above come and go.
If you are using a database for its advanced general purpose programming capabilities? The chances probably start to get more likely.
Databases that are a store of data don't need to change because they already do their one job (disk <-> relational model translation) really well. If they are pressured to do 2 or 3 tasks like business logic then suddenly it is a lot more likely that there will be pressure to swap databases.
If I were using SQLite and someone wants to do fancy triggers then maybe I need to swap to PostgreSQL. Avoiding that sort of decision making is a great reason to seal the data completely away from the code.
> That all just sounds a touch complicated compared to data in database, code somewhere else (like git).
You use Postgres as a deployment target and not as a replacement for git. It's not complicated at all. You even get features like transactional deployments and the ability to prohibit applications from directly touching the data.
> pgSQL specialises in the relational model of data
The relational model is SQL:92, Postgres does much more than that. Postgres has JSON support, recursive CTEs, Row Level Security, and Window functions that would require dozens of lines of procedural code to do what can be done with a single OVER in its SELECT clause.
> If I were using SQLite and someone wants to do fancy triggers then maybe I need to swap to PostgreSQL.
If you want to put your business logic into an RDBMS, you wouldn't be using an embedded DB anyway, but rather Oracle, Postgres, SQL Server or DB2, which are designed for this type of architecture.
This thinking really intrigues me, as it seems like it fell out of fashion (at least for greenfield projects), but it might have a come back.
Could you share a few cases where it is better to use a custom type as opposed to a relation? The only things I can think of are generic things like uuid. Would there be a need to create an Employee type vs an Employee relation?
Also, how is the experience with using Python for stored procedures? One reason they are not used is that the language pl/sql is non-familiar to most. If anyone has any experience with that, could you share some of your thoughts?
It could just be something more pedestrian like not yet being able to handle database changes well for deployments. Things like blue/green, canary, reverting, etc. It's a bit of work to get that functioning well.
Just because most developers use their DB as a dumb store doesn't mean it needs to be. There are also plenty of successful software systems that place the majority of their business logic and use a generic programming language and runtime only for the presentation layer.
If you're comfortable fully exploiting the capabilities of your DB, then the intelligent combination of a relational model with custom data types, constraints, triggers, views and stored procedures can make the DB the perfect place to implement business logic.