Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Total security in a PostgreSQL database (ibm.com)
218 points by amirmansour on Feb 8, 2014 | hide | past | favorite | 58 comments


This is an interesting, detailed, and well-written article.

Let me caution you though: in most applications, if you concede to an attacker INSERT/UPDATE/SELECT (ie: if you have SQL Injection), even if you've locked down the rest of the database and minimized privileges, you're pretty much doomed.

Most teams we work with don't take the time to thoroughly lock down their databases, and we don't blame them; it's much more important to be sure you don't give an attacker any control of the database to begin with.


You are absolutely correct. But by properly locking things down, you can reduce the probability of giving an attacker write privileges.

If you simply split your connections into readonly and read/write pools, you've already drastically reduced your attack surface. (A third DDL role exists for migrations only, and does not exist in your app.) Now SQL injection on your homepage or "display content" page cannot give the attacker INSERT/UPDATE/DELETE.

You can go a little further and split your tables/columns into public ("I could reconstruct these by scraping the site") and private ("I might need to login for these"). Then create a role with access to the public data only, and use it for public views. Now an attack on your homepage/other public views can give the attacker only a limited SELECT.

This is all pretty easy to do and it can drastically reduce your attack surface. It also gives you a useful way to prioritize fixing vulnerabilities. First lock down read/write views. Then lock down read. Lastly lock down readpublic.


When I'm completely in control (writing a totally custom app) I do all my DB access with stored procedures. The application connection cannot see or access any tables directly. It can execute stored procedures, and that's it. I essentially build an API with stored procedures, and that's what the applications use.


I don't do this because maintaining stored procs is a nightmare. All of a sudden my migrations folder is a mirror of my git history, except that "merge" takes on a nightmarish new meaning.

Nowadays I generally write raw SQL with the Tiramisu library (shameless plug: https://github.com/stucchio/Tiramisu ) I cooked up. It's basically writing straight SQL, but with the ability to safely do (almost) string concatenation to build queries.


Why would you need to store the historical versions of the stored procedures in a migrations folder? Wouldn't it be better to treat the stored procs like any other code, create a .sql file for each and keep them in version control? And then you can have a migration script that re-creates all the procs when you do an update.


Generally your migrations folder is supposed to store a complete history of your DB. Maybe doing it the way you suggest is a better way, it's just a bit outside the standard and to be honest I haven't thought carefully about the best way to structure code this way.


Alex at The Daily WTF wrote a great article on this subject.

http://thedailywtf.com/Articles/Database-Changes-Done-Right....


That's a decent read, but unfortunately even running the same precisely-versioned and ordered change scripts against dev, test, prod dbs can produce divergent results depending on the data in the db at the time. For example, prod could have a null somewhere, or data using a different url encoding flavor.... etc.

(Or if you're really lucky, even the current load on the database could make it fail... i.e. transaction serialization failure due to the random row order you attempted to lock)

So it should have ended with 'you're doomed'.


That's a tooling problem. Sqitch or something similar can help.

http://sqitch.org


Stored procedures don't offer any protection against malicious input. An SQL statement built up of concatenation including user input is just as bad in a stored procedure as it is directly in a program.


> Stored procedures don't offer any protection against malicious input.

Not necessarily true. Parameterized queries generally do and most stored procedure queries should be parameterized. Add good check constraints....

Now, the protection isn't perfect but it is well beyond no protection at all and can be a good level.

> An SQL statement built up of concatenation including user input is just as bad in a stored procedure as it is directly in a program.

Yeah, and sometimes that's the only way to do things in PostgreSQL. So if you find yourself having to do this (I find it most common in CREATE statements), then heavily commend and make sure (via frequent audit) that all values from the input are properly quoted by quote_ident and quote_literal functions.


> Not necessarily true. Parameterized queries generally do and most stored procedure queries should be parameterized.

But that's not anything special about stored procs. Queries called from application code should generally be parameterized, which provides the same protection.


It is different because in SQL you can't do non-parameterized queries, and in PLPGSQL you have to specify you are executing dynamic sql. This means that since it is in the db, it is going to be parameterized by default.

So what is different is the fact that unless you use a non-SQL-derivative stored proc language, you pretty much have to put up big warning signs any time you put in place a possibly exploitable query.

Examples:

1. This is not possibly exploitable:

    CREATE OR REPLACE FUNCTION foo(in_bar int) RETURNS setof foo
    LANGUAGE PLPGSQL AS
    $$
    BEGIN
        RETURN QUERY
            SELECT * FROM foo WHERE bar_id = in_bar;
    END;
    $$;
2. This one is exploitable.

    CREATE OR REPLACE FUNCTION new_user(username text) 
    RETURNS BOOL LANGUAGE PLPGSQL AS
    $$
    BEGIN
       EXECUTE $E$CREATE USER $E$ || username;
       RETURN TRUE;
    END;
    $$;
3. This one is not exploitable.

    CREATE OR REPLACE FUNCTION new_user(username text) 
    RETURNS BOOL LANGUAGE PLPGSQL AS
    $$
    BEGIN
       EXECUTE $E$CREATE USER $E$ || quote_ident(username);
       RETURN TRUE;
    END;
    $$;
The point is that it is crystal clear in these cases whether a query is parameterized internally or not. If you don't see the combination of EXECUTE and ||, there is nothing to worry about.


Stored procedures can do some more input validation, tough, that can be useful. You can protect (from the integrity standpoint) some of the content with traditional tools like foreign keys, but procedures are more flexible.


Yes, that is why yo never do that, regardless if its in a SP or in the calling code.


Your stored procedures are "M" (Model) in MVC. It is a good abstraction, however I see no advantage (security-wise) in keeping this layer in DB stored procedures. It could just as well reside in some application layer.


In general, with LedgerSMB, we don't usually use stored procedures for security reasons. We use them for abstraction reasons. There are cases, however, where we do add security barriers to them so there are reasons why having this as a layer in the db can be a good thing.

Let's take an example: separation of duties in accounting. You want someone to enter transactions and you want someone else to approve them, and you want to enforce this through db permissions.

You might give insert permissions to user 1 but update permissions to user 2. Except that you don't want to allow updates. You only want to allow some updates, to a few columns.

So now you grant update permissions only on some columns. Except that you don't want to allow all updates on those columns. You only want to allow updating from, say, a null value to a definite value (say, indicating who and when).

You can't do this with standard permissions, so you now create another role with permission to update these columns, and require that mere mortals go through a stored procedure to update them, which will only allow those rows to be updated which had nulls in these values.

Now, I hear you say, you could do this with a join table called 'approval' with these columns so you don't have to worry about NULLs. Great, except that NOT EXISTS() queries tend to suck and you can't use partial indexes to ensure you can always quickly look up those transactions pending approval. So in the end NULLs, stored procedures, and an intermediate role end up winning out in terms of security.


Almost all damage done by SQLi is from reading. So removing write capability might reduce attack surface, it does nothing against the most effective method of total database exfiltration.


Read my next paragraph. A split of public and private data is only marginally more difficult, and is very helpful:

    CREATE TABLE public_user_content (
      body VARCHAR(512) NOT NULL,
      user_id BIGINT REFERENCES users(id)
    )

    CREATE TABLE users (
      id BIGINT PRIMARY KEY,
      username VARCHAR(128) NOT NULL,
      email VARCHAR(256) NOT NULL,
      badly_hashed_password VARCHAR(8),
      social_security_number VARCHAR(9)
    )

    GRANT SELECT ON public_user_content TO public_readers;
    GRANT SELECT ON users (username) TO public_readers;
Then in your view:

    @run_with_role('public_readers')
    def get_public_feed(request):
        ....
SQL injection in `get_public_feed` is no longer part of the attack surface for stealing PII.


This is a good idea.

However, in the context of forms authentication, there comes a delicate moment where the user name and hashed password must be exposed to an unauthenticated user, no?


Of course. And once one user logs in, if you have SQLi vulnerabilities in the `edit_user_pii` page, the attacker has read/write access to everything. Like I said, it's a way to reduce your attack surface, not a way to completely lock everything down.


> if you concede to an attacker INSERT/UPDATE/SELECT (ie: if you have SQL Injection), even if you've locked down the rest of the database and minimized privileges, you're pretty much doomed.

Well... no. If you lock down your db correctly, using the correct privileges for the users accessing the data, and using veil or rls, the sql executed on the db is just an other way of accessing the data you can see anyway through the webapp. The webapp becomes just a nice frontend so the user does not need to execute sql directly, and if there is an sql injection vulnerability the attacker would still be able to only see the data he has the right to see anyway. He is just accessing it with sql instead of the webapp.


To attempt this security model, which can work if everything in the universe goes right for you, your application needs to be designed for it from the very beginning. Virtually nobody does this.

If your application wasn't designed from the start to make its database connection "just another view onto data users would have access to anyways" (paraphrasing), locking down the database is closer to make-work than it is to critical security measures.


Actually in LedgerSMB we retrofitted such a security model onto the application (in 1.3) because it was the only real hard security model we could retrofit onto the application.

A security model which follows these processes merely has a database that doesn't trust the application. This has some significant costs and tradeoffs associated (you can't do connection pooling).


> To attempt this security model, which can work if everything in the universe goes right for you

Almost all webapps we write, we write them this way. (on oracle, not postgres, but that makes no difference).

> your application needs to be designed for it from the very beginning.

That's true. As in "you can not add security later on".


No, that's not what I said. I didn't say you "can't add security later on". I said that you can't rely on database access control configuration to protect an application that hasn't been designed from scratch to do that. Most applications don't rely on database access control; they rely on the application server to protect the database.


15 years ago, in client-server Era, relying on database Access control was the default. It's a shame that all that knowledge has mostly been forgotten, and most programmers don't actually know how to use a database.


Just because the application assumes it is trusted by the database doesn't mean the database has to trust the application.


That works as long as the database server is kept up to date with security patches. Quite often in production running a security patch is not high priority.

I guess it all depends on using this feature as an additional layer of protection or using it as the only layer of protection.


So each user of your webapp has a respective database user?

Otherwise I don't know how you would be able to make the permissions fine-grained enough to be able to make these claims.


That's exactly what we do in LedgerSMB. More to the point, the db user credentials are application login credentials so the application has the db permissions of the user. No password? The app can't access any data.

Now this has some significant tradeoffs. It is very much a good decision for an ERP app. However for customer portals we are looking at a different approach, namely:

1. A portal user account in the db with very restricted permissions (and no direct access to tables)

2. A portal back-end user account in the db with limited access to tables and which owns most of the security definer functions the portal uses. I see no reason why we might need dynamic SQL in such functions, so no avenues for SQL injection unless you can exploit a trigger or something. For fine-grained permissions, the functions could do additional permission checks based on an auth token.

This should effectively prevent SQL Injection from being particularly useful in this context.


Under this model, how do you store user (database) credentials?


The application logs into the db with the credentials the user supplies, so the problem is not our problem anymore.

This means, btw, that you can hook up LedgerSMB to use LDAP Auth simply by changing PostgreSQL's LDAP settings. Here's an example howto, using LDAP via Samba (I would prefer Kerberos, but this demonstrates what people can do with such a model):

http://cocnm.computerisms.ca/index.php/Install_Ledgersmb

Now, again, this is great for an internal business tool. It may not be what you want for a high-scaling public web app. We use a different model for customer portals (with security again encapsulated in the db, but through a different way).

It's worth noting btw that PostgreSQL allows you to specify auth method by connecting host and requesting user/database, so you have a high degree of control over what sorts of connections can be enabled or disabled in such an environment in the db backend.


That's true, but unfortunately for the state of application security today, most applications developers today treat the DB as a stupid storage box, despite the fact that more effort has been put into the security model of almost any RDBMS they might be using than will be put into the whole of most applications using it, which means in terms of access control, most applications end up reinventing the wheel, badly.


> Let me caution you though: in most applications, if you concede to an attacker INSERT/UPDATE/SELECT (ie: if you have SQL Injection), even if you've locked down the rest of the database and minimized privileges, you're pretty much doomed.

I like how you qualified this with "in most applications." The huge mitigation factor would be if you restrict these heavily by user by using db users as application users, to enforce permissions. This has significant tradeoffs in a number of areas (particularly in scalability) but is very much worthwhile for internal business tools.

> Most teams we work with don't take the time to thoroughly lock down their databases, and we don't blame them; it's much more important to be sure you don't give an attacker any control of the database to begin with.

Well, beyond that, locking down the database is a lot of work. I am not sure of two presuppositions in your position though, namely:

1. That it is possible to fully lock down a database. There are tradeoffs that have to be made somewhere or shortcuts that get made, or new attack vectors that might develop over time....

2. That it is possible to guarantee that one denies an attacker any control over the database to begin with. Even if you encapsulate the db behind stored procedures (which we do in LedgerSMB) you have the issue that down the road, someone adds a trigger which is vulnerable to SQL injection and even if your higher application levels are perfectly secure.

The long run approach I think is to see security as something a team should set the bar very high for, but will inherently always be a work in progress.

For example, one thing not covered in this article is what happens when you run a function as security definer, as a superuser. This means that the function may do things that may fire triggers, and now those triggers also run as superuser, so eventually you get to a point where something in the pipeline is vulnerable. Fully locking down the database means, generally, that superuser security definer functions probably should not read/write rows, but that security definer functions that do read/write rows (i.e. non-system operations) should probably be assigned to nonsuperuser owners.


Doomed in what way? That the SQL injection allows the attacker to see or mess with data? Or that the attacker somehow gains greater access to the database, server, OS, etc? Or both?


Most attackers won't be SSHing into your servers and firing up psql processes to mash your database as that would require gaining user access, SSH access and psql access. Most attackers will go the cheaper and more likely route of turning your application on your server by using SQL injection.


The latter.


Very nice article.

The section under "the ideal administrator" is quite eye-opening. I pretty much use PostgreSQL exclusively, and I've found that every time I learn something new, there is another mile of learning to go, and that feedback cycle never seems to end.

I have a few PostgreSQL-specific book on admin and server programming, but I wonder where I would be able to go to really learn this stuff. Are there any classes or places to go for this sort of SQL training?

How does one go about becoming a total master at this? I find that, out of all the programming that I do, I love working with SQL the most and I want to dive deeper into it.


How does one go about becoming a total master at this? I find that, out of all the programming that I do, I love working with SQL the most and I want to dive deeper into it.

Exposure, exploration, and experience.

I started my career in a role similar to the one it sounds like you're in today: at first, I was just another developer. Then I was the developer who knew and liked SQL pretty well. At some point, there was a shift where I started spending more time helping people do database-y things than I did developing. Subsequently, keeping our PostgreSQL instances happy was officially made my job. Then I got a job with the title DBA based on that experience. Though I've moved on a couple times since, that's the hat I've worn for the better part of a decade now. It's not always the most exciting work (and when it is exciting, it's often enough the wrong kind of exciting), but it's almost always interesting.

It probably helped that I've always been a technologist for sake of fascination with technology, who happened to realize you could also make a decent living with those skills. I'd started playing with Linux in the late '90s, for example, out of a "What's this all about?" sort of curiosity. Net, I almost accidentally ended up checking nearly every box in that "ideal administrator" section. My C is pretty rusty, and my network admin-fu is weak, but I don't think the breadth of my skills are unrelated to people (management and technical folk alike) at my last several jobs telling me I'm the best DBA they've ever worked with.


Also keep in mind that PostgreSQL provides unparalleled programming possibilities. I don't think it is humanly possible to fully master every possible aspect of PostgreSQL. Consider what this means:

1. Mastering standard data modelling techniques

2. Mastering advanced, PostgreSQL-specific data modelling techniques

3. Mastering server-side procedure programming in a large number of languages (C, Perl, Lisp, TCL, PHP, Python, Java, ...)

4. Mastering server-side type/language handler programming

5. Mastering tuning PostgreSQL to the OS, and tuning the OS to PostgreSQL

6. etc. This list can probably go on forever.

The best you can do is master a significant part of things and use that to carve out a niche.


I would suggest subscribing to and reading the fine postgresql mailing lists (general and performance).

It's amazing what you can learn from the questions and answers they get.


EnterpriseDB offers training around their product (enterprise of PostgreSQL) but they might occasionally inject EnterpriseDB specific materials.

Other than that, the knowledge is rather sporadic (books, documentations, mailing-lists, the actual source code itself, join PostgreSQL community, etc).


DeveloperWorks puts out some really great content from time to time. This article and their article on POSIX Asychronous I/O in Linux[0] are two of my favorites.

0 - http://www.ibm.com/developerworks/library/l-async/


Is there some kind of row-based security approach in postgres?

Let's say I run a hosting company, and when a user logs in, I want to limit DB access of this particular connection to rows that actually matter to the logged-in customer (like purchased services, associated accounts) and still allow access to general-purpose information (like list of available TLDs, stock prices for services etc.). Can I do that somehow?

Then I'd use authentication outside of the database (like with LDAP), and only allow access to the database after login; that way information leaks should be pretty much contained to the logged-in customer.

EDIT: Seems it's a work in progress: https://wiki.postgresql.org/wiki/Row-security


The SE-PostgreSQL project provides row and column-level security. It's based off the work of SELinux, so it may be overkill for what you're looking for. Plus, it has all of the inherent complexities of SELinux, so that may be a turnoff for you. But, if you're REALLY paranoid about security, it's a great tool.


WITH CHECK OPTION is coming in 9.4 . It's already done apparently.

You can sorta do it now with security_barrier views, but they aren't auto-updatable, so you need to write more code, which is a PITA.


In addition to the work in progress, you could use security barrier views.


In the case where your application and database are running on the same server, you can eliminate the database password entirely by running the application in a separate user account [1] and running your database on a UNIX socket [2] with peer authentication [3].

[1] You should really be running your web application in its own user account regardless of how you interface to your database.

[2] Running daemons on a UNIX socket is better security-wise than running on localhost, because you can protect the UNIX socket with filesystem permissions.

[3] http://www.postgresql.org/docs/9.3/static/auth-methods.html#...


Great read. But I was disappointed that it didn't mention other password encryption schemes, i.e. Blowfish. www.postgresql.org/docs/8.4/static/pgcrypto.html


The PDF is dated from 2009 maybe it was common at the time to rely on MD5.


MD5 challenge/response auth today would typically be used in short-hop logins. It might be further protected with SSL on longer hops. For larger networks, I would assume you'd want to use Kerberos auth.


Are they seriously recommending the usage of unsalted md5?

Edit: Oh, the article is from 2009 (I'd say it was bad practice even back then though).


An excellent article, but it brings up a question about authentication using the various load balancing tools out there, such as pgPool or pgBouncer. I've found the auth tools in them to be extremely poor, to the point that it's easier to just leave it off.

Has anyone gotten it to work transparently?


Wow, this is very in-depth. Bookmarked for reading when I get home. Thanks for sharing!


Isn't default postgres user password authentication still MD5?


Yes, it is. Though it is salted (but not randomly sadly).

For stronger password hashing, you can keep your user accounts in LDAP, and have PG authenticate with LDAP.


Pretty good article but had to laugh when I read this:

> Common practice dictates that passwords have at least six characters and are changed frequently.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: