Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Need Postgresql HTTP API Extension (postgresql.org)
80 points by xmulti on Sept 20, 2014 | hide | past | favorite | 63 comments



Here's the video of that YAPC::EU 2014 presentation by Laurent Dami - http://www.youtube.com/watch?v=DF93WU2wTEI


Hm... I saw some cpan links and a slideshare presentation. I was prepared to be impressed by said "couple lines of Perl".

;)


At what point did HTTP and JSON become synonymous to Simple? Yes, a protocol and a data format can be convenient when applied to a specific problem, but that doesn't mean that everything should use them.

I've seen people refuse to use LDAP, and request a JSON (not rest) API instead, thus disregarding the security benefits of using a protocol that has 20 years of experience for dealing with access control and authentication.

Postgres has a tremendously powerful and efficient network protocol. There is great value in using and understanding it, because not every problem is a nail that can be hammered with a few GET requests.


I've seen people refuse to use LDAP, and request a JSON (not rest) API instead, thus disregarding the security benefits of using a protocol that has 20 years of experience for dealing with access control and authentication.

Not really. HTTP/JSON is just a transport mechanism, which is not particularly secure in LDAP (it just uses SSL). The layers above of the protocol can simply use HTTP/JSON as the transport mechanism, like multiple existing LDAP-to-JSON gateways show. There are no security benefits being lost.

There is great value in using and understanding it, because not every problem is a nail that can be hammered with a few GET requests.

That would be a good argument against a proposal to replace the existing protocol.


Sure, you can treat HTTP as a transport mechanism, which is what we've all been doing for the last 10 years because opening ports on firewalls sucks.

But HTTP is a resource manipulation mechanism: GET/POST/PUT/DELETE/... It's not optimized for transporting data.

Postgres, however, does provide a protocol that is optimized for transporting data [1]. It's also very simple to use, and full of thousands of features that are bug free. Which is why I find it odd to try to recreate a stable and efficient protocol into something less optimized, just so that a minority of developers can use http requests...

[1] http://www.postgresql.org/docs/current/static/protocol.html


> But HTTP is a resource manipulation mechanism: GET/POST/PUT/DELETE/... It's not optimized for transporting data.

Resource manipulation is primarily transporting data (resource representations) -- sure, DELETE, HEAD, and some of the auxiliary verbs aren't so much about that, but GET / PUT / POST are.


Pointless extra API layers is why our software is slow.

What is the reasoning for this except "simple access"? It's not simple when it adds another API on top of an already existing one, that it needs to be kept in sync with. If Android cannot use the normal binary API (which is nowdays pretty secure, compact, bug-free), then Android needs to be fixed.


Why would it be an extra API layer? This appears to be building an HTTP API directly onto postgres, so that instead of:

app <- http -> web middleware <-> libpq <-> postgres

You would just have:

app <- http -> postgres

HTTP is also pretty secure, compact, and bug-free, and has the added advantages of being the universal firewall tunnelling protocol and implementing every authentication mechanism ever invented. It also does compression-on-the-wire, which postgres still doesn't have for its byte-packed protocol, and you don't need as much extra code to interact with it because everybody has an http client already.

What is the reasoning for using the postgres protocol? I'm not really seeing any advantages compared to HTTP.


The postgres protocol is stateful, which is necessary for multi-statement sessions (and obviously for multi-statement transactions), as well as COPY. Emulating it in a stateless protocol is much messier.

It's also a lot faster and easier to parse. With HTTP you don't even know how big the request you need to parse is until you've done some parsing.


The advantages compared to HTTP are similar to SPDY's advantages compared to HTTP -- connection multiplexing, compact message format, and server push. HTTP is wholly unsuited for use as a high performance RPC protocol.


"universal firewall tunnelling protocol"

That obviously defeats the purpose of a firewall, so firewalls will either be eliminated, or will go deeper to do filtering.

"[HTTP is] compact"

Not particularly.


“Every program attempts to expand until it can speak HTTP. Those programs which cannot so expand are replaced by ones which can.”


Who are you quoting?


It's Zawinski's law applied to HTTP instead of mail.



I don't think this will catch on, nor do I think it really should. Nearly all applications out there need a robust security layer to keep people from messing with things they shouldn't have permissions to.

Additionally, you should always validate business logic on the back end. Otherwise it's trivial to, say, give yourself a 10% raise. Or confirm that you got the new high score.

Even if used as a backend protocol, there's just way too much overhead in HTTP requests and responses.

I can see use for something like this as a dev tool if only given private network access for only dev tools, but aside from that I'm not sure it's that worthwhile.


Theoretically postgres already has a robust layer of security and validation; although it may not be used for the typical web app. It seems questionable to have additional API level checks when the database already implements the same security features in a more tested manner. I don't want to set two levels of permissions for exactly the same thing that may behave differently.

Why not expose something similar to a prepared statement with the HTTP API? That lets you define input data types, and only queries that have been explicitly enabled could be run.


> Theoretically postgres already has a robust layer of security and validation; although it may not be used for the typical web app.

What Postgres has in terms of validation is essentially a not terribly sophisticated type system, plus stored procedures in PL/pgSQL, Perl, Python or Tcl. These are frankly a pain to manage. In terms of security, well... outside of stored procedures, there is no special way of restricting a given connection/user to only rows with the right foreign key in a table. On the other hand, many web framework have some sort of authorization framework.


It is never entirely obvious where this kind of user/row level security should be placed. Personally I don't like using frameworks with complicated authorisation systems that I don't understand. I don't think it makes sense to use that kind of system for a hand written SQL heavy project which is what this API is undoubtly aimed at.

I would rather integrate user validation into normal queries, or have it handeled at the edge with an application firewall along with other kinds of input validation.

You could always do something like this...

  INSERT INTO test
  select %(inputdata)
  where %(cookie) = hmac(%(userid), 'serverkey','sha256' )
  and char_length(%(inputdata)) < 1000


There are `security_barrier` VIEWs. They give a "way of restricting a given connection/user to only rows with the right foreign key in a table". Still you have to create them all (not a simple task for complex DB scheme).


That'd make in a very simple case (an app with a user_profile table) one view per user? This doesn't sound particularly appealing...


"one view per user" - you can have a single VIEW, that filters rows based on user name ("owner_user_name" column in the base table). Or use some kind of mapping table for "user_name" <-> "table PK id". It gets more complicated with inherited user roles, but still manageable.


They are not a pain to manage at all, and what is so lacking with the type system?

>In terms of security, well... outside of stored procedures, there is no special way of restricting a given connection/user to only rows with the right foreign key in a table

"If I specifically exclude the way to do this, there is no way to do this"? Also views work fine.


Having a new application-layer transport doesn't impact any of these things. We're talking about how queries and their results are exchanged between a client and the sql server here.

SQL is a pain to parse, having a JSON API would make things much more hackable.


Yet most NoSQL stores have HTTP interfaces and they work fine, while being convenient to work with.

Compression can reduce the data size overheads, and HTTP auth can deal with security issues too.


>Nearly all applications out there need a robust security layer

How does using HTTP preclude that?

>Additionally, you should always validate business logic on the back end.

Uh huh? So how is talking to the back end going to prevent that?


  How does using HTTP preclude that?
Well, some of the discussion suggests client apps (web browsers, mobile apps etc) can talk directly to the database server, and the traditional database server security model is at the table level.

So if I had insert and delete permissions on the Hacker News comments table to insert or delete my own posts, I could insert and delete every user's posts.

It's possible to imagine hacks that might work around this problem (e.g. comments-table-per-user, or liberal use of stored procedures) but none of them sound like particularly good ideas.


> the traditional database server security model is at the table level.

Actually, many RDBMSs (postgres included) provide security at the command, relation (table/view), and column level.

With that, and appropriate view definitions and triggers, you can essentially as fine-grained permissions as you'd like. We manage this all client side because while much of the theory of this has been around for servers and held up as the ideal model since very early in the RDBMS era, a lot of the necessary features were not implemented in many servers (or implemented so wildly inconsistently between servers as to make practical, rather than theoretical, knowledge sharing in the community difficult.) That's less true now, though it still requires implementation-specific knowledge for the SQL implementation you are using.


You are inventing the problem you want to exist. The traditional database server security model does not allow any application users to have permissions to any tables at all, only views and stored procedures.


i don't think this is a valid counterargument -- even if access is only given to views / stored procs you would have the same issue of a client connection directly to the HTTP api having too coarse-grained security privileges since they could be executed with any input params (unless there was some kindof extra security layer spanning front-end to db layer? sounds crazy)

but this could be a moot point as long as db communication is only done server-side so it can be behind company firewall anyway -- after all, isn't that why the current db security model works? No one should EVER post ajax to the db, essentially. & i dont think anyone is suggesting that, i hope


I don't know how to make it any clearer. You connect as user A. You authenticate. You can only do things user A is given permission to do. Yes, people are suggesting talking directly to the DB. We want that because we already use our DB this way anyways, but we have to have a completely unnecessary web layer sitting in front of it just to pass the request directly back to the database without doing anything to it.


HTTP doesn't preclude robust security at all. But how is a database-direct API layer going to enforce Role Based Access Control? Or really any access control model? Database security is at best per-command and per-table, when true app security often needs to be per-row, or even per-column, perhaps depending on data in the other columns of the same row. Example: user x has access to change payment details (all in the same table of course), but not those for user y. A straight-up HTTP API layer won't have the flexibility to enforce that, or it will be entirely too complex in order to support every possible business rule set.

As far as business logic, how is an RDBMS going to make sure that your remaining balance on your gift card is higher than that order you just put in? Sure, you can validate that on the client side, but I can just as easily use anything that will make an http request to make that same http request without doing any business rules validation. Heck, I can alter javascript while it's running just to circumvent business logic if I so desire.


> Database security is at best per-command and per-table

per command and per relation is more common (though many databases, including Postgres, actually also include column-level permissions, as well), but relations aren't restricted to base tables.

> true app security often needs to be per-row, or even per-column, perhaps depending on data in the other columns of the same row.

The effects of per column and per row security can (even if per-column permissions aren't directly available), in most DB engines, be managed using either views, stored procs, or both.

> As far as business logic, how is an RDBMS going to make sure that your remaining balance on your gift card is higher than that order you just put in?

A few options:

(1) Order submission is a call to a stored proc which performs the validation then does any necessary table updates, or

(2) Order submission is through insert to a table (or view) which has an appropriate BEFORE INSERT trigger that performs the validation,

(3) Order submission is an uncondition insert to a table, but that table is just an "incoming_orders" table that isn't particularly actionable -- the actionable views on that table (executable_orders and orders_with_payment_errors, say) are views that based on joins that validate the payment information, including gift cards.

There's probably other database-side ways of dealing with that problem, as well.


>Or really any access control model?

The same way you already do?

>Database security is at best per-command and per-table, when true app security often needs to be per-row, or even per-column, perhaps depending on data in the other columns of the same row.

Stored procedures already solve all the problems you are imagining. Sane people already use them anyways.

>As far as business logic, how is an RDBMS going to make sure that your remaining balance on your gift card is higher than that order you just put in?

The same way it already does? If your database is not enforcing those rules then you are already doing it very wrong.


This is quite useful to have. Several tech oriented companies do automatically create a data service when a new database / table needs to be exposed. No reason why Postgresql cannot do the same.

I'd imagine it looks something like this when it is finished - http://blog.dreamfactory.com/add-a-rest-api-to-any-sql-db-in.... The concept is similar although I've not used it personally myself. Given that it is open source perhaps Postgresql can learn from the user experience this product has to offer.


There's a little-known repo I'm keeping an eye on that is attempting to do just this using Haskell: https://github.com/begriffs/dbapi


I work in Business Intelligence(BI) building data warehouses. I think such functionality could be useful. I see many posts with security concerns, in a BI/analytics environment where you don't really modify data but just analyse, being able to access the data without an application layer could be very useful.

IMHO BI one of the reasons BI departments struggle to deliver is because of the technical debt acquired over time. A big part of this debt is in the application layer. Data exists but you need to jump through modelling tools to show it to your end user.


I think there is immense value in extending the capabilities of an SQL data store so that it can communicate over HTTP.

After building an extremely complex Web app, which, at the end of the day is CRUD with tonnes of business logic, I have felt the need for the following:

* Not being forced to write a JSON API layer for data coming straight out of a DB table * Not having to design &amp; implement SQL semantics in an HTTP API, ie limit &amp; offset, or JOINING with a table to get records of an an associated model, or filtering using complex logic. Basically anything that would've been straightforward if written as a bunch of SQL queries. * over time as our app has grown, we have found ourselves implementing a lot of CHECK &amp; TRIGGER CONSTRAINTS on the DB to ensure no bug in the app layer messes up the data invariants EVER. Basically, sometimes I wish I could've written my app logic in the DB itself and be done with it. The old approach of exposing a a custom procedure (plPG/Sql) for each business function.

Actually I think of a step forward and wonder why we need an OS stack in the first place. Just boot directly into the DB server and let it use the bare hardware to deliver extreme performance.


> over time as our app has grown, we have found ourselves implementing a lot of CHECK &amp; TRIGGER CONSTRAINTS on the DB to ensure no bug in the app layer messes up the data invariants EVER. Basically, sometimes I wish I could've written my app logic in the DB itself and be done with it. The old approach of exposing a a custom procedure (plPG/Sql) for each business function.

Having worked with an application mostly written in (Oracle) PLSQL, frankly, don't do that. It's a pain to maintain, a pain to test, a pain to version. And PLSQL is fairly awful. For extra pain, use DOM manipulation in PLSQL.

> Actually I think of a step forward and wonder why we need an OS stack in the first place. Just boot directly into the DB server and let it use the bare hardware to deliver extreme performance.

That's what unikernels are for [1], though not exactly bare metal in this case.

1: http://openmirage.org/


"I like this idea in principle, but in practice I wonder about the implications of it. Normally the database is a sort of special-purpose component with its own resources and scaling methods. I think if someone actually did apply the full power of this system to build their whole web application logic into the database server they might run into problems if/when they need to scale up due to coupling the database and application layer. Everyone who didn't build their web application logic in the database would pay the performance penalty for parsing / authenticating requests in a scripting language instead of in compiled C code. The database layer is one possibly rare place where people will be counting milliseconds of overhead per request as a problem." This.


While I'm unsure of what a http query api would look like it would be hugely useful. Every language has some sort of http and xml or Json library, and so do many monitoring tools.

I probably wouldn't use it for an application, but for back office things like monitoring and stats it would be great.


I would very much welcome this.

Love the new json/jsonb datatypes and this would be a great way to use those.

I'd personally prefer pg to mongo and even without the HTTP API I could see myself using it in the future, though with an HTTP API there would seem no reason not to use pg.


There's a nginx -> postgres module: Frickle that supports at least the outbound portions of this.

http://labs.frickle.com/nginx_ngx_postgres/


Also a Lua driver for openresty https://github.com/azurewang/lua-resty-postgres - Lua in postrges is great for doing authentication layers.


This seems like it should live as a layer above postgres for a while, and after it gets widespread adoption the project could become more of a standard part of a postgres distribution.

That would allow some time to sort out the tricky aspects like authentication and access control, and also to see what the performance hit is like and find ways to minimize it.


I'd love to see a jsonapi[1] interface to postgres. I'm mainly building ember apps at the moment and the rails layer is really just an interface straight onto the database(all the logic is client side).

[1] http://jsonapi.org


That works fine as long as you never need security or robust application logic. Somebody with CURL could really mess up your day.


As this is basically all the same architecture as client-server 20 years ago, we only need look there to find all the necessary design pattern. Some of the things we used when building applications using oracle 4gl developer tooling:

* abstract away the actual tables behind views, so users don't have direct access to the tables, and you can still modify your tables

* add a table-api to abstract away direct crud actions on your data. Table api's are 3G plsql modules that contains business logic to validate and update date (in the database).

* use vpd (virtual private database) to automatically rewrite queries to include security rules (user and rbac security handled by the database)

* have multiple database object owners (scheme) to split up the database object into different modules. Never use scheme owners to access database from your client as a regular app user.


I can't see any reason why application logic would be any less robust on the client vs the server. With regards to authorization there's examples from the nosql world, e.g. couchbase's sync gateway[1]. You could argue that you might as well use rails instead but a proxy that's focused on just authorization can be far more performant.

[1] http://docs.couchbase.com/sync-gateway


The reason it's less robust is that I could use curl to send a DELETE request for records I shouldn't be able to delete. Or POST some data saying my order is paid in full. Business logic that matters at all has to be validated by some authority. The client is never good enough.


That's the purpose of the sync-gateway(in couchdb's case), it handles authorisation of actions against couchdb.


Just because I'm authorized to create a record doesn't mean it's a valid record. Take my new order example. If all business rules, including payment authorization and payment record keeping, are handled client side, I can easily say that a valid payment has been made when it hasn't.


I can see how there's a class of applications that would need the level of verification you're talking about(payment authorization being a good example). Many apps don't need that level of verification though, an app like trello for instance only needs to authorize actions against an entire resource, not values within a resource.


On the other hand, if your app has no business rules, one would assume that your server-side application is going to be extremely thin.


you can do the same with psql, rails bin commands, php, or anything else that can talk to the db


Which is precisely why your DB server is behind your firewall.


I was thinking of a pg-as-a-service (got pgaas.com/pgsafe.com) solution not too different along these lines. Does the idea have any merit? Would people be interested?


This needs to happen! Imagine an environment in which curl is available but you cannot install the client libraries for PostgreSQL.


It mightn't be a good idea to have such tight coupling between data schema and API...


You can use views or stored functions to abstract stuff...


That's what REST is supposed to solve. Unfortunately, the API responses given here don't seem to take advantage of it.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: