Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Database heresies (The "right" and "wrong" way to do ORM) (b-list.org)
24 points by toffer on Aug 4, 2008 | hide | past | favorite | 16 comments


Sigh. There is so much misinformation being spread by today's software engineers without proper CS backgrounds. As anybody who's read Date and friends knows, relational databases are fundamentally more expressive (and therefore better) than network databases (which is the model used by object databases), being grounded in set theory and predicate calculus.

Your bad taste for relational databases is more likely a result of using a crappy ORM pattern (such as Active Record). Go try a proper implementation of Data Mapper (I can't recommend SQLAlchemy highly enough), then come back and tell me how you feel.


That sounds dangerously like pedantry to me. The point is not that the mathematical expressiveness of the network model is as powerful or more powerful than what you can do with relational stuff. The point is that the network model maps nearly 1:1 with the actual programming model used to implement the application logic.

All that relational goodness does you no good if you want to architect your application using an OO data model in a web application server. Maybe you're trying to argue that that's the wrong way to do web applications? But if you accept that people want to use Rails/Django/etc... then you have to also admit the argument that RDBMS's are the wrong tool for the job. This is true for the same reason that a Dremel is the wrong tool for framing a house, despite the greater expressive power of the Dremel's cutting tools.


I'd actually argue the best approach is to develop both sides fully: I agree that an OO data model is the best fit for non-trivial application logic, but I also believe relational databases make the best data stores. The two can coexist just fine, it's just a matter of selecting the proper design patterns and supporting libraries -- which is why I mentioned SQLAlchemy. In its developer's words:

"SQL databases behave less like object collections the more size and performance start to matter; object collections behave less like tables and rows the more abstraction starts to matter."

This false dichotomy that's always presented is distracting, and leads to people naively taking religious positions about what should in fact be a non-issue.


Yes. And vans are more commodious (and therefore better) than cars.


Would you argue that the ability to form ad hoc queries for answering questions not originally imagined when writing an application fails to make the relational model superior to something without that property?

(I see you develop with Rails though, so probably yes.)


Rant:

I'm afraid I have to side with the author against the "relational set theory purists." I actually don't care much about styles of ORMs, and I don't even care that much about having to write SQL, either. I can do, and have done, both. For me, both work equally well, or rather equally poorly.

I loathe relational databases because, every time I have used one, it has eaten a vast amount of time in making the schema and queries fast enough. Yes, I know what third normal form is, and I start my database designs with it. I know how and when to denormalize to improve join performance. I know how to look through a query plan and create necessary indices. I've even looked through low-level IO statistics to figure out where a slow query wastes time doing unnecessary IO. Guess what: that work sucks. I've used Sybase, considered a "real" RDBMS, and I used MySQL, considered a "crappy" RDBMS. Both have sucked about equally, just in slightly different ways.

I am an application and system-level programmer. I want to be able to tell my app: "store this data for future retrieval." I want the store to be instantaneous. I want the future retrieval to be instantaneous. Period. As a major bonus, I want to be able to distribute the store between multiple machines, but do so as transparently to my application as possible. I also don't want to risk losing my data when one server loses its disks, so I want at least semi-transparent replication. ACID semantics are, obviously, a big plus. No RDBMS gives me this flexibility in designing an app. Oracle RAC, supposedly the cat's meow of scalable RDBMS, has RW database load balancing, but it still relies on having a single centralized SAN store on its back end.

To add insult to injury, the promise of arbitrary queries against the data just doesn't pan out. Yes, I can run an arbitrary query. Overnight. Adding this query to an application with a non-trivial schema and a large dataset usually requires so much indexing and query optimization work, that I might as well write the equivalent retrieval code for a Berkeley DB store by hand. I hate going through my queries and using trial-and-error to figure out where a subselect will outperform a join, and I can't afford to hire a bunch of DBAs, who also happen to know set and relational theory, to do this work for me.


It seems the Prevayler or HAppS style of data storage & retrieval is what you and I want (and perhaps in their hearts, every other non-db-specialist as well)

Basically, you just define data structures in your language like you would any other piece of data, and it is silently in the background serialized into an efficient (I would hope) ACID datastore.

http://www.prevayler.org

http://happs.org


You sound like a guy who knows his databases, and I freely admit I don't, but here's what I use to keep my life simple:

I work in java, and so I have the benefit of a stack trace where ever I want. So in the class that deals with sql queries I keep a hashmap with stacktraces as keys and total execution times as values. Whenever I feel an application sluggish, I just optimize the first 3-5 queries, which usually it's just "explain" and "alter table add key", and using a cache when it's serious. This allows no other thought whatsoever about optimisation. I don't even use indexes by default, on the grounds that they always slow inserts but not always speed things up.


FTA:

It turned out that relational databases were in the right place at the right time, and the “good enough” implementations and uses took over the world.

Yeah, that's what I've thought for a long time too. It's nice to see that in writing.

But I guess the rest of the essay leaves me a little cold. If the relational model is fundamentally flawed (rather: fundamentally mismatched to the overwhelmingly popular application design paradigm) then isn't the solution just to chuck the SQL database in the trash and start over?

So what has us tied to SQL and afraid of using the filesystem for storage directly? What are the other things we get from a database that we want? As far as I can see, the real need is for atomic transactions and consistent backups (or replication, which is the flip side of those features). No one cares about the "relational" stuff at all, really, as proven by all the awful ORMs out there.

And yet, if that were true, we'd all be happily using Berkeley DB. So what else am I missing? Some part of me thinks that there's a great startup idea in this space, but I don't quite see it yet. Tools like Google's App Engine seem to be attacking part of the problem by at least tweaking around with the SQL assumptions, but they're still fundamentally relational at their core (and they're still services, and my gut tells me that most of the world will never trust their data to someone else's service, no matter how well-justified it might seem).


It may be gratifying to see your hunch written out but the author of the original article is just wrong about the history of relational databases. They were not just in the right place at the right time, or "good enough," or driven by marketing hype as the article claims. You might want to read the well-documented history of relational theory, early RDBMS implementations and how they fit with what was already in use, and the ongoing evolution of RDBMSs and SQL.

Since the relational model predates OOP, to say nothing of flashes in the pan like Rails, it's backwards to say that the model is "fundamentally flawed" because it is mismatched to OOP. The mismatch isn't even real: OOP is a valid way to describe applications but it is not intended to describe databases. The relational model can enforce consistency and data integrity at the whole database level, which is a lot different than the object or single application level.

Programmers who developed large systems pre-Oracle have a very different view of the benefits of RDBMSs than programmers who first tried to use a database while developing a toy program with Rails. A little experience will teach you that for pretty much all real applications in the world the data is the only thing that has value; application code comes and goes. Any business that puts OOP purity before data integrity and consistency will not last. Would you feel better knowing that your bank uses Oracle, or that they use Rails and a half-baked ORM?

You might think "no one cares about the relational stuff at all" based on ORMS and Rails, but that is a peephole view of the world. The sooner you realize that Rails et al. are the Hollywood celebrities of the IT world the sooner you'll figure out how important those things really are.

I had more to say about this subject a while back in my article "Why Programmers Don’t Like Relational Databases" at http://typicalprogrammer.com/?p=14


I think the biggest advantage is that it makes the data somewhat independent from the app. This sounds bad when you design the app, but in practice it's a godsend. Probably because it allows you to refactor the code as you wish, as opposed to having the data tying you up forever.

I remember reading about the way Microsoft did things in office: saving was just dumping the program memory in a file. It's obvious it was bad, but what's interesting is that the most important reason it was so horrible is that it forced you to carry the old code with you forever, just to be able to load and save. Having another abstraction layer in between would have made it much more manageable, even if it saving would still mean dumping the memory somewhere.

Of course, in their particular case that may have been an advantage.


A big part of the reason for Relational DB's continued supremacy, is that it's become a part of the back-end language of business IT. Even though an OODB might be a better fit for an application, it's a handicap that your app is not on top of a relational database. SQL has become a kind of protocol for data exchange. Unfortunately, it makes for a very loose and sloppy protocol that completely spoils encapsulation.


Sure. I'm not expecting something new to displace SQL at a Fortune 500 IT shop. But the question is why are otherwise-groundbreaking technologies like Rails still wedded to the ORM stuff, when clearly there isn't any conservatism at work there. Anyone willing to bet the company on some weird language from Japan isn't going to blink about changing their datastore.


Actually one of the most interesting things at RailsConf was MagLev, which applies the GemStone object persistence engine to Ruby. Apparently there are some pretty big Smalltalk apps out there using GemStone. http://chadfowler.com/2008/6/5/maglev and http://ruby.gemstone.com/


One of the interesting technologies that Gemstone has had in its portfolio for years is the ability to make its data available in relational/SQL DB form. You can have your app running on top of an OODB, but still interface with your Fortune 500 comrades who want to do SQL queries for Crystal Reports.


I had the opportunity to work with Smalltalk/Gemstone in 1994. At that time we thought OODBMS was the next big thing. Programming was easier. However, soon it was clear to me that OODBMS was not the way to go. For example versioning is quite simple in an RDBMS compared to an OODBMS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: