MongoDB gets support for multi-document ACID transactions

_wc0m · on Feb 15, 2018

MongoDB has successfully played the 'hype first, features later' strategy. Now it is well on the way to being a decent swiss-army-knife database.

The RethinkDB retrospective[0] contains a lot of insight into how MongoDB has succeeded despite being vastly inferior on a technical level back when it first launched. I have to admit them a certain respect for executing their strategy so successfully.

Choice quote:

Every time MongoDB shipped a new release and people congratulated them on making improvements, I felt pangs of resentment. They’d announce they fixed the BKL, but really they’d get the granularity level down from a database to a collection. They’d add more operations, but instead of a composable interface that fits with the rest of the system, they’d simply bolt on one-off commands. They’d make sharding improvements, but it was obvious they were unwilling or unable to make even rudimentary data consistency guarantees.

But over time I learned to appreciate the wisdom of the crowds. MongoDB turned regular developers into heroes when people needed it, not years after the fact. It made data storage fast, and let people ship products quickly. And over time, MongoDB grew up. One by one, they fixed the issues with the architecture, and now it is an excellent product. It may not be as beautiful as we would have wanted, but it does the job, and it does it well.

[0] http://www.defmacro.org/2017/01/18/why-rethinkdb-failed.html

SCdF · on Feb 15, 2018

> MongoDB has successfully played the 'hype first, features later' strategy. Now it is well on the way to being a decent swiss-army-knife database.

I have no idea how capable MongoDB is these days, as I haven't used Mongo in years (and even then it was not for long).

However, I do not know any developers who, after living through the "hype first, features later" strategy, have been left with a positive enough opinion of MongoDB to ever want to use it again.

stevenwoo · on Feb 15, 2018

Epic had a post mortem blog post here that mentioned in passing they had stumped all the experts they could find to look at unsolveable issues they had with MongoDB. https://news.ycombinator.com/item?id=16340462 I kind of assumed the fix is going to be a rewrite with Postgres or MySQL.

Thaxll · on Feb 15, 2018

- That's not what they said.

- You think people replace a MongoDB cluster by a single Posgres instance? You guys should really use HA, cluster in real life and stop reading reddit / HN and the hype behind PG, with 3.5M+ CCU no one would use an architecture with a single master / slave ( that's what pg is ).

MongoDB / MySQL have bad press by people that never used it in real life and just repeat what they read online.

I could tell you horror story about pg not have an official replication system until 2011 when pg 9.0 landed.

takeda · on Feb 16, 2018

> I could tell you horror story about pg not have an official replication system until 2011 when pg 9.0 landed.

I could tell you a horror story that happened to me just few weeks ago, where MariaDB just corrupted data out of nowhere due to a bug[1]. This happened multiple times and costed us multiple hours of work (including service being down) each time it happened until we realized the issue wasn't hardware but a software bug.

If you ask me, I take PostgreSQL approach of not having a broken replication before 2011 than MySQLs still corrupting data. Data usually is the most valuable asset a company has.

[1] https://jira.mariadb.org/browse/MDEV-10977

dijit · on Feb 16, 2018

> I could tell you horror story about pg not have an official replication system until 2011 when pg 9.0 landed.

And I would re-iterate that just because something isn't in mainline, doesn't mean it's not possible. Did you know that Pg didn't have native partitioning until Pg10? Somehow we managed to do partitioning before then.

I don't buy the argument that you need to ship broken features just to have them; Pg doesn't include it into base until it's a /good/ solution which is well engineered and has appropriate toggles. That is not a horror story.

> - You think people replace a MongoDB cluster by a single Posgres instance? You guys should really use HA, cluster in real life and stop reading reddit / HN and the hype behind PG, with 3.5M+ CCU no one would use an architecture with a single master / slave ( that's what pg is ).

I shipped a game which had similar CCUs (within the order of magnitude) and I can confirm that you can't do it with one postgresql machine, or.. actually you could but we chose to fsync() constantly to prevent corruption from ever happening and remove the RAID cache.. but you can shard on top of your database solution too.

woolvalley · on Feb 15, 2018

You can cluster with postgres & mysql, but you have to implement the clustering / sharding logic yourself.

Nowadays I would just use redis & cassandra if you need something beyond a collection of postgres instances. Most projects do not.

dankohn1 · on Feb 16, 2018

For clustering MySQL, take a look at Vitess. It was developed at YouTube and recently adopted into CNCF:

https://www.cncf.io/blog/2018/02/05/cncf-host-vitess/

(Disclosure: I'm executive director of CNCF.)

stevenwoo · on Feb 15, 2018

I feel I need to quote the post mortem back at you so you point out where I mis read the quote.

"Our top focus right now is to ensure service availability. Our next steps are below: Identify and resolve the root cause of our DB performance issues. We’ve flown Mongo experts on-site to analyze our DB and usage, as well as provide real-time support during heavy load on weekends."

How does that disagree with my post?

Thaxll · on Feb 16, 2018

You're implying they couldn't fix MongoDB or reached the limit which is false. In the current ( HN ) post they said they fixed it, I'm pretty sure they didn't have any experience in DBs in the first place hence why they asked for help.

Nowhere in the original post they mention issues related to MongoDB itself it was probably bad design on their side.

stevenwoo · on Feb 16, 2018

OK I should have been clearer about my interpretation - I read flying in experts as they flew in experts from MongoDB and stumped them so that had me thinking maybe this is not possible if they stumped them. Earlier in this thread one of the engineers from MongoDB says Epic solved the issue but had not updated the blog so I was wrong about that.

hobofan · on Feb 15, 2018

> mentioned in passing they had stumped all the experts they could find to look at unsolveable issues they had with MongoDB

That's not really what the article is saying, unless we interpret the following text differently.

"We’ve flown Mongo experts on-site to analyze our DB and usage, as well as provide real-time support during heavy load on weekends." -> "We have started to look into the problem together with experts" and not "Experts have tried and failed".

munk-a · on Feb 15, 2018

Yes, come over to the actually greener lawn of Postgres, it is amazing how much more powerful than MySQL it has become with recursive CTEs https://www.postgresql.org/docs/current/static/queries-with.... and some rather amazing JSON support https://www.postgresql.org/docs/current/static/functions-agg...

derekperkins · on Feb 15, 2018

JSON has been great in MySQL for a few years now since 5.7, and recursive CTEs are coming in the next couple months with 8.0. I don't think you can make a wrong choice between the two these days, but choosing Mongo over either is almost always the wrong decision.

https://www.infoworld.com/article/3228154/sql/whats-new-in-m...

takeda · on Feb 16, 2018

This is actually what I think the biggest power of JSONB is.

I can for example use jsonb_agg() and get a hierarchical response for 1:N joins. It returns JSON value even though neither of the columns contain JSON.

Previously in that scenario I would either need to make more than one query or get a response that has a lot of data repeated.

redwood · on Feb 15, 2018

Another way of reading that post is that MongoDB is being used for some of the highest throughput concurrent workloads out there... and those are always hard to optimize. Doing a lift and shift for a "grass is greener" alternate solution is not a clear cut path to victory at all... but it's certainly a giant science project to contemplate.

stevenwoo · on Feb 15, 2018

Yes, I kind of wish MongoDB had come out and said what they are doing to help Epic Games here if this is something to address that issue or what the plan/thoughts are on the most newsworthy usage of MongoDB in a while.

drmirror · on Feb 15, 2018

I am one of the team of MongoDB engineers working with Epic on this issue, and I can assure you that the situation is under control and we have everything in place to scale this application to much higher numbers. However, we're not publishing details about our support cases, especially while they are in progress. That is something for Epic to decide, and I do assume they will eventually say in public just how well MongoDB is, in fact, performing for them.

stevenwoo · on Feb 15, 2018

Thanks for responding. I look forward to Epic updating their post mortem.

threeseed · on Feb 15, 2018

Not sure if you are aware but what you are asking for never happens.

It is not professional or appropriate for vendors to be revealing (a) that clients are having issues and need support and (b) the specific workings of technologies or processes within the client's business.

karmakaze · on Feb 15, 2018

Funny to list MySQL here which preceded MongoDB with cheap/fast now, correctness later.

drcongo · on Feb 15, 2018

There's a whole 'nother generation of devs coming through who have never been burned by MongoDB though. Obviously they will be eventually, but by then another generation will come along to repeat the cycle.

e12e · on Feb 15, 2018

I love deriding mongodb as much as the next dev that hasn't used it much; but I'll just note that while I'd still be hard pressed to prefer mysql over postgres - there was a long period where mysql was put to tasks it was ill suited for, especially prior to around version 4.x.

So while "hype first" might reap a deservedly abundant and bitter harvest of developer hatred - it doesn't preclude evolving into a genuinely useful product...

da_chicken · on Feb 15, 2018

Both true, although I can completely understand why devs went with MySQL over PostgreSQL at that time. I remember that during the same time period that MySQL was drawing seemingly endless criticism for generally poor RDBMS behavior (3.x and 4.x), PostgreSQL was notorious for having poor performance due to insanely undersized default settings. Like out of the box it was sized to run with at most 10 MB of RAM or similar that was just unrealistic.

I also remember it also had a lot of quirks and missing features prior to v8. I assume it was leftover cruft from Ingres, but I remember PostgreSQL v6 and v7 being unreasonably complicated to get configured just because the defaults were so off reality.

One thing you can say about PostgreSQL, though, is that it's developers don't rest on their heels. Every major release packs in a ton of new features. They've gone from being fairly low or middling on the feature set to being pretty near the top. Even point releases have me saying, "Wow, that's really nice to have."

e12e · on Feb 15, 2018

At the time oob postgres probably wasn't the (wrong) competition, sybase, Ms sql, Oracle was...

Or maybe, probably managed postgres. Esp. In the mysql 3.x days and earlier.

nulagrithom · on Feb 15, 2018

I used it about 3 years ago and my first thought was "How broken will multi-document ACID transactions be?"

I still want to like MongoDB, I still miss its style of query vs SQL, but I'd have a hard time advocating its use again...

Sometimes it's tempting to use it for projects that I know will remain small, but even then it's not worth the overhead of standing up a different DB when I have a perfectly good SQL server I can muddle through already.

nailer · on Feb 15, 2018

Check out Rethink. It seems to be what you're after.

SamReidHughes · on Feb 15, 2018

It's probably better to go with Cockroach or TiDB. RethinkDB has problems of its own.

fastest963 · on Feb 15, 2018

We are in the process of evaluating CockroachDB vs Rethink internally and we've found CockroachDB to perform very poorly without obvious disk or CPU issues. I'm curious if you've seen different especially as it relates to Rethink.

SamReidHughes · on Feb 16, 2018

I didn't do comparisons. But RethinkDB has straightforward issues like a slow QL implementation using a lot of CPU and a lot of disk space usage. Change feeds have a few scaling issues if you want a lot of them. I don't know that it has mysterious kinds of excessive resource usage. I'm a dev of RethinkDB, not an end user, so I might be seeing the worst side of it. I haven't used Cockroach or TiDB.

ceohockey60 · on Feb 16, 2018

TiDB seems to have a pretty active community, judging by its repo (https://github.com/pingcap/tidb). Also, saw this thread on TiDB v MySQL recently that's pretty detailed (https://www.quora.com/How-does-TiDB-compare-with-MySQL). Looks like a good option worth trying out.

frankpf · on Feb 15, 2018

Can you clarify on those problems?

SamReidHughes · on Feb 15, 2018

Performance, mostly. Too much CPU and disk usage. Change feeds don't scale well.

mercer · on Feb 16, 2018

What problems (outside I guess the current status of the project as a whole)?

eksemplar · on Feb 15, 2018

I work in a Danish muniplicity. Traditionally we've build everything on SQL because it's the world we function in, but we adopted the MEAN stack as a proof of concept a few years back and Mongo has been growing ever since.

It does require building and maintaining schemas in a different manner, but when you do that, it's pretty great to work with, especially when we're doing design driven development that consists of a lot of prototyping.

I'm a fan, but I'm a manager on business development and digitisation, so I may be a little sheltered from whatever annoyances it may cause in operations.

donttrack · on Feb 15, 2018

I work on a mongodb installation in a danish municipality. From the technical side mongodb has been great to work with.

jchb · on Feb 16, 2018

I am curious why a municipality needs custom software. I mean, the scandinavian countries had standardised paper forms for most municipal tasks (population register, ledgers etc) already in the 17-18th century, and those were used nationwide, or at least throughout a single province. Why can't the same be done with software?

eksemplar · on Feb 16, 2018

Well there are 98 municipalities and 98 ways to operate in a thousand different ways.

I’ve worked on quite a few multi-municipalitiy open source projects, like handling employee refunds on driving.

Basically I drive x kilometers for a meeting, I get paid x and the taxman gets the report. Simple stuff.

Well in the 6 parties involved there were 6 ways to interpret tax laws, 4 different agreements with unions on what rates to pay, 3 different payment systems with 3 very different ways of taking the reported data from a flat file to a rest interface, at least one political decision to overrule tax laws for a certain set of employeees and several different ideas on how to host it and so single sign on, oh and 4 different ways to obtain employee data.

That’s for a simple system with basically 1 function. We have more than 350 it-systems.

Another example is in automation. We have a scanner software and we have an archiving system. They both have APIs but the APIs speak very differ languages. This meant that our local scanner people were tasked with distribution after they scanned things, a task taking several hours each week because putting files into many different areas of an archive sucks. What we did was ask the scanning company to build a QR reader into their software, and then we made a piece of software that put the archive recipient addresses into QR codes. We also made a MOX agent, that accepta the output of our scanning software and loads it into the archive through the API. So now the process of distributing is automated.

You can certainly run a municipality without developers, using standard software and outside hires, it’s just really expensive.

jchb · on Feb 16, 2018

Would it be fair to say that the political entity one step above the municipalities (whatever that is in Denmark) are not doing their job? I mean not doing their job on standardising things that can be in common between the municipalities. Some things will of course have to differ, but a lot of stuff likely differ just because not-invented-here. It sounds like the legislative environment is too complex, and that you have to work around it with a ton of software. Could it even be the case that computer systems have somewhat removed the incentive for the administration to rationalise the various systems? With just manual labor and typewriters all of that would have been very expensive, but with a server hall and a medium-size IT-team it kind of works out. Perhaps digitalisation only having come half-way is a factor - you mention scanning, but by now the so called "paper free office" that was a buzzword in the 1990s should be here already. Or is it perhaps just another sign that the IT industry overall is still very immature and this will sort itself out with time?

eksemplar · on Feb 16, 2018

I think it's too complicated to blame anyone really. I mean, we working on standardising as much as possible, but it's often impossible because business practices are just so different. Often big standard products fall extremely short, or end up in complete failures because you can't jam people into boxes on an enterprise scale especially not when the people who build the systems have next to no domain knowledge and the people who write contracts have no technical knowledge. :)

I guess our government should work on writing laws that are more friendly to digitisation and stop expecting IT to fix business practices that don't really make sense in the first place. There has been a genuine movement toward that, but it's slow because none of our top politicians or bureaucrats are from technical fields, and they operate on such a high strategic level that they're often rather far from the daily challenges in a daycare institution.

Local political leadership and bureaucracy could certainly do more to focus on corporation, standardisation and digital transformation, and they actually do, but political views differ and they change every 4 years, and the truth is that there just isn't any voter interest in IT unless it goes wrong.

We're trying to build national standards, we've had a set of architectural standards called Rammearkitekturen for a few yers now, but getting them implemented is slow. For one they're made by muniplicities and our structure of government is split in three. Muniplicities, Counties and the State and each branch has it's own ideas, leading to bureaucracy and political differences. Some want us to use EU standards, others want us to build our own, and even if we decided, there are different sets of EU standards as well as different sets of Danish standards.

I personally think the best we can do is try to use whatever national standards are in favour, and build smaller applications on them, with open API's, and run everything as SaaS in infrastructures such as AWS or Azure. I also think we should do a lot more work on business development, modifying business practices before we throw IT at something.

But it's complicated and it's on a giant scale where even minor changes take years to implement

philipkglass · on Feb 15, 2018

I used it quite happily circa 2009 and at a different company in 2014. In both cases it was being added to systems that already had mature functionality built atop a RDBMS. In the first case it was used to store events that had started to overwhelm the main RDBMS with write volume. (Originally a system with one database as the monolithic data store.) Probably Kafka would have been even better for this use case, had Kafka been available at the time. But MongoDB did the job very well. I did a prototype in Cassandra too before settling on MongoDB, but MongoDB had much better docs, drivers, and single-node read performance at the time.

The second time I used MongoDB to automatically track templated email bodies that were being delivered through a third party mail platform. We had dozens of recurring templates and many more one-off templates for different curated campaigns. If somebody complained that a link or image or token was wrong in their email, we wanted to be able to look back at the history to see if the problem was in the template data or potentially a client issue on their side. Most of the queries were ad-hoc and not very performance-sensitive. This was where a flexible JSON document format came in handy. Modern Postgres would have worked well for it too, but that wasn't available in the company at the time. With MongoDB I got good flexibility, adequate speed, and I avoided reinventing wheels by not trying to shoehorn the data into another MySQL table. I was able to solve a customer support pain point in less than a week and the system has worked well for nearly 4 years now.

I'd be really frustrated if I had to use MongoDB as my only data store. I would guess that much of the hate for it comes from people who were forced into that position, or maybe from people who didn't take its documented limitations seriously enough before productionizing its use.

golergka · on Feb 15, 2018

I don't know much about MongoDB. I am mostly a client-side developer after all.

But every time I see a team transitioning from Mongo to something else, they transition to a relational database. May be their problem is not with MongoDB, but that their data is relational after all?

Personally, I'd take a relational db over NoSQL for most of my needs, but all these stories don't really say anything about how Mongo compares to other NoSQL databases.

takeda · on Feb 16, 2018

That's because nearly all data is relational. At first it seems like you don't need a relational database, in fact NoSQL seems easier to use at first.

As your data grows though you realize that your application become more and more complex. A single query might translate to multiple queries to the database, you need to handle scenarios where fields might not exist etc.

With relational data you might have more work at front, but then the database solves many of the problems for you.

As another person said, when you're using databases like MongoDB you're going back in time and reliving the history, because databases in the past looked a lot like that before Codd invented the relational model, for example [1].

Also the whole NoSQL thing seems to be cyclical, we had XML databases in early 2000s[2].

[1] https://en.wikipedia.org/wiki/Hierarchical_database_model

[2] https://en.wikipedia.org/wiki/XML_database

addicted · on Feb 16, 2018

Or maybe it's useful to get started off with a low overhead, easy to implement DB like Mongo and then as you grow larger, spin off uses that it doesn't serve well to other more specialized and complicated DBs?

One of the biggest problem with relational DBs is that once you decide on a schema, if it's the wrong one, you're gonna be in a lot of pain. Which makes a NoSQL DB a great fit for an early stage product where you are still figuring out what your product needs to do and contain. Once you have some more experience with it, and have a better understanding of your data, it's far easier to build the correct relationships.

takeda · on Feb 16, 2018

> Or maybe it's useful to get started off with a low overhead, easy to implement DB like Mongo and then as you grow larger, spin off uses that it doesn't serve well to other more specialized and complicated DBs?

Not really, converting to relational data is quite a bit of work.

Actually the reverse is the correct approach. You start with normalized data, when there's a bottleneck you start denormalizing it, if that's still not enough you move /subset/ of data to NoSQL database.

> One of the biggest problem with relational DBs is that once you decide on a schema, if it's the wrong one, you're gonna be in a lot of pain.

Not really from my experience all migrations were done through SQL. Also if multiple people (who understand relational databases) come with a schema they pretty much will arrive to the same normalized result.

ams6110 · on Feb 16, 2018

> every time I see a team transitioning from Mongo to something else, they transition to a relational database

They are repeating the discoveries that people made in the 1970s about storing data in flat files vs. relational models.

Know history or be doomed to repeat it and all that.

verelo · on Feb 15, 2018

Yep this is me. I would require some pretty amazing reasons to even consider using Mongo again. Especially now all other relational databases i trust support json column types.

nnain · on Feb 16, 2018

> However, I do not know any developers who, after living through the "hype first, features later" strategy, have been left with a positive enough opinion of MongoDB to ever want to use it again.

A new crop of developers is, always, just an year away though. I feel future adoption depends a lot on how well-suited the tools are for younger devs. That's where MongoDB found the initial audience!

btilly · on Feb 15, 2018

MongoDB has successfully played the 'hype first, features later' strategy. Now it is well on the way to being a decent swiss-army-knife database.

I was going to say that I won't believe that it is on its way to being a decent database until after an article appears on https://aphyr.com/tags/jepsen saying that MongoDB actually delivers on what it claims.

So I looked for the most recent analysis of MongoDB and found https://jepsen.io/analyses/mongodb-3-4-0-rc3. I still want to see verification of the latest release, and hear battle stories from it in production. But I'm provisionally optimistic that a lot of the glaring "it is a pile of shit that doesn't work when the chips are down" issues are now addressed.

That said, I bet that it will be many years before most people who got burned by MongoDB ever rethink their attitudes about it. Once burned, twice shy. And it really was an overhyped steaming pile of shit for a very long time.

threeseed · on Feb 15, 2018

I have used MongoDB in production for a number of Fortune 100 sized companies. It has always been a unique database that was ideal for scenarios when your data model was document orientated.

> was an overhyped steaming pile of shit for a very long time

No it wasn't. This is something you heard from people who never really used it. It had its faults but it was never a pile of shit nor was it substantially worse than other databases.

gaius · on Feb 15, 2018

This is something you heard from people who never really used it

I used it at a previous job. Project to move a multi-tera dataset from an Oracle box (24 CPUs, 24G RAM, SAN) to a MongoDB cluster (10 boxes, each with 48 cores, 96G RAM and internal SSD). MongoDB couldn't perform for shit, and it couldn't stay up in a usable state for more than a few hours at a time. This is with 20x the processors and 40x the memory of the system it was replacing. It's a complete joke of a product, sold on the basis of outright lies as far as what they told us and what it could actually do. Having been that badly burned I consider it an act of selfless public service to warn people off it.

If you're just using it for a personal blog that gets 10 views a day, sure it might be barely adequate for that. But I'd still use Postgres.

threeseed · on Feb 15, 2018

See this is the sort of crazy things I used to see people do and wonder why they had problems. MongoDB is a document database. You can't just take relational database tables, move them across and expect it to behave the same. And frankly I don't feel sympathy for bad engineering practice. You don't do system migrations without fully testing and understanding all of the systems.

But for those of us that had document orientated data models it allowed for performance that was orders of magnitude faster than any SQL database.

btilly · on Feb 16, 2018

If you never tracked what happened in production, it may have worked most of the time well enough that you never saw how bad it was.

But read https://aphyr.com/posts/284-call-me-maybe-mongodb and https://aphyr.com/posts/284-call-me-maybe-mongodb for an idea of how the promises in MongoDB documentation compared to the reality of the software under stress. And it wasn't just hypothetical either - there are plenty of horror stories floating around from people who ran into those problems in production for uses cases that were supposed to be a fit for MongoDB.

And the performance argument didn't hold water either. As benchmarks like https://www.enterprisedb.com/node/3441 showed, decent relational databases consistently beat MongoDB on the same hardware. Yes, lots of people rewrote bad relational models and saw performance improve. But apples to apples, writing an application against a relational databases in the same way you would against MongoDB resulted in a win for the relational database.

So yes, there were lots of people saying exactly what you are saying now. But the ones who actually tested their systems and ran performance tests came to a very, very different conclusion.

threeseed · on Feb 16, 2018

Again. I have been personally involved the deployment and support of MongoDB clusters for very large datasets at very large companies. It does work if you use it for the right task i.e. highly nestable data not relational data. And let's be clear that if MongoDB was unusable then the company wouldn't still be here as successful as they are.

That EnterpriseDB link is completely ridiculous. Firstly, it predates WiredTiger which replaced the entire storage layer. Secondly, doing one for one comparisons with relational systems doesn't make sense. MongoDB is a document database. Compare it with other document databases.

metheus · on Feb 16, 2018

The EnterpriseDB benchmark is just a hit piece. See https://newbiedba.wordpress.com/2017/05/26/thoughts-on-postg...

btilly · on Feb 16, 2018

From your link, go to https://newbiedba.wordpress.com/2017/11/27/thoughts-on-postg... for the followup after he wrote those benchmarks. When he ran his benchmarks, he indeed got better throughput on MongoDB. But the 99% performance was massively worse - in fact slow enough to be unacceptable. To an extent that he concluded that you'd be better off using PostgreSQL.

And he's right. As pages like http://latencytipoftheday.blogspot.com/2014/06/latencytipoft... make clear, we have a lot of calls back to the application happening. Users will notice the occasional slow load surprisingly quickly, and it is worth a lot to get rid of them.

So even your chosen source agrees. A relational database is not orders of magnitude slower. In fact, a relational database is probably a better fit.

takeda · on Feb 16, 2018

> It is benchmarking PostgreSQL psql against MongoDB Javescript shell (SpiderMoney/v8). That’s not a very fair benchmark is it?

Isn't javascript engine used whenever you interact with MongoDB?

metheus · on Feb 18, 2018

Goodness, no.

gaius · on Feb 16, 2018

See this is the sort of crazy things I used to see people do and wonder why they had problems.

Financial time series data is exactly one of the use cases Mongo claimed to be for. Seems you’re the one who can’t tell good engineering practice from bad. And yes, they also pitched themselves as a direct replacement for Oracle. That was highly disingenuous.

dijit · on Feb 16, 2018

> No it wasn't. This is something you heard from people who never really used it. It had its faults but it was never a pile of shit nor was it substantially worse than other databases.

This is FUD, I have used mongodb, I have a certification in mongodb even.

Unless you know precisely what you're doing it's very easy to burn yourself. And mongo markets itself as being "easy to use out of the box" this is not a good thing to do.

I consider MySQL defaults to be unsafe, (as in, it used to corrupt data silently) but it's a godsend compared to the data consistency in mongodb.

There are countless promises it fails to deliver on too, I will not, ever, recommend it for a project. However in recent months I've heard it got better- This means I will stop deriding developers who now use it. But it does not mean I will be realistically allowing its use in the environments I work in. I tend to care about the data consistency in those.

FPGAhacker · on Feb 15, 2018

There is a trend I noticed where I work.

Most people that “get it right” the first time around do not get any recognition whatsoever.

It is the people that screw up, release with big flaws that the customer then pressures the company about, that are heralded as heroes and bacon savers when the fix those flaws. After 3 years and as many releases.

sorokod · on Feb 15, 2018

The "squeaky wheel gets the grease" syndrome.

justaguyhere · on Feb 15, 2018

That is true in life in general, not just work place.

Nobody cares about people who are healthy all their life. But someone who suddenly realizes they need to eat better and exercise, and they do, they are applauded. They are defended too, if they go back to old ways. And so on...

tejasmanohar · on Feb 15, 2018

>Most people that “get it right” the first time around do not get any recognition whatsoever.

I've heard similar complaints before. And, I get it, too-- at a glance, that person is playing the "superhero" by saving the project. But, good management will insist on root causing failures where this will unravel. If it's a recurring problem, you should bring it up with management.

luckydata · on Feb 15, 2018

My biggest gripe as a "lateral manager" (I don't manage engineers, I manage products) is that I see those things happen all the time, and I spend time coaching developers to interact effectively with their managers as much as I can. It's frustrating when I see people that should know better (because I know they heard me) not taking notes about serious issues they want to discuss with their superiors, not knowing how to escalate issues that threaten the well being of the product or the team but that their direct superior doesn't believe are urgent etc...

Developers complain about management but tend to forget that managers are people just like everyone else, and we need to apply some skill to our interactions if we are to get the results we desire.

mycelium · on Feb 15, 2018

This completely squares with my experiences as well — a lot of instances of complaints about management are hollow because developers aren't managing upward correctly. Their followup on their issues is missing, or non-actionable.

Do you have any resources you've found helpful improving your skill at this?

crdoconnor · on Feb 15, 2018

>But, good management will insist on root causing failures where this will unravel.

You can have management that understand tech who will get to the bottom of the problem and you can have management who don't understand tech. They won't.

Management who don't understand tech will either keep somebody on hand who they know and trust who does understand tech (e.g. a consultant) or, more likely, they'll just keep rewarding the faux superheroes who keep screwing up and bailing themselves out.

tejasmanohar · on Feb 15, 2018

I'd say that good management needs to understand how their subordinates think and operate, even if they haven't played their exact role (e.g. engineer). The best managers that I've worked with, both lateral (e.g. PM) and direct (e.g. EM), take the time to get familiar with engineering processes if they don't know about them already and speak their language.

crdoconnor · on Feb 15, 2018

>It is the people that screw up, release with big flaws that the customer then pressures the company about, that are heralded as heroes and bacon savers when the fix those flaws. After 3 years and as many releases.

There's going to be a ton of survivorship bias even with them. It just goes to show that big marketing budgets are such a competitive advantage that can outweigh not actually being any good.

I'd seriously like somebody with a passing knowledge of data integrity who believes the tech industry is meritocratic to explain what they think the success of mongo is all about.

_ugfj · on Feb 15, 2018

Reminds me of this old blog post: http://webchick.net/embrace-the-chaos

vog · on Feb 15, 2018

On the other hand, PostgreSQL is a very good example of a successful implementation of the opposite strategy, that is, "correctness first".

And since PostgreSQL fills that niche very well (correctness + real ACID + extensibility + decent performance), maybe it was really PostgreSQL who killed RethinkDB?

luckydata · on Feb 15, 2018

If you're playing the long game and not looking to make a profit that's fine, but PostgreSQL as a company would have been doomed a long time ago. You have to keep in mind the timelines of the business and what they need to do to keep the lights on.

MongoDB has identified a real pain point: many developers don't like to use SQL to interface with a transactional database. I'm not going into the merits of SQL vs. NoSQL, I'm just stating that it's clear there's a need or they wouldn't have gotten any traction.

Now they are maturing the product to the point it might be a safe bet for some use cases, it remains to be seen if their approach to product development will pay dividends or the reputation they have created for themselves has created a time bomb that will eventually kill them.

jeffdavis · on Feb 15, 2018

"PostgreSQL as a company would have been doomed a long time ago"

PG has astonishing feature throughput. With each yearly release, they add 1-3 wow features, 6-10 major features, and countless smaller features still worthy of the release notes.

That's really, really impressive for any database, commercial or otherwise.

There's a perception that postgres is slow to add features because sometimes the feature latency is high. The reason for that is they build a solid foundation first, and slowly build multiple major features on top of that foundation. Consider replication:

1. Write ahead log (WAL) 2. WAL archiving 3. Warm standby 4. Hot standby + Streaming replication 5. Synchronous replication 6. Logical decoding of WAL 7. Logical replication

That's a lot of engineering work there, but they delivered value to users at each stage along the way. And during this time, they did a ton of other stuff -- did you notice that we got parallel query along the way? And logical table partitioning came along too, which means the parallel query can now do partition-wise parallel joins.

Not to mention all of the SQL features and tons and tons of other stuff.

Postgres has kept the lights on for a lot of companies for a long time. I absolutely reject the idea that good engineering is at odds with business success.

chipotle_coyote · on Feb 15, 2018

Postgres has kept the lights on for a lot of companies for a long time. I absolutely reject the idea that good engineering is at odds with business success.

I don't think they're at odds, per se, but having been around through the original dotcom bubble, PostgreSQL (or "Postgres95," as I'm pretty sure it was still called when I was introduced to it!) was mostly known to, well, database nerds for at least the first decade of its life. One person's "solid foundation" is another person's "technically correct but practically crawling" -- a perception that, rightly or wrongly, PostgreSQL fought against for a very long time. And I think that's what OP was trying to get at: if PostgreSQL was being developed primarily by a single VC-funded company, they just might not have had the luxury to spend years building that solid foundation.

(I'll allow that as an ex-RethinkDBer, I may have some bias here: I loved many things about the product and especially about the product, but it's hard not to suspect we should have focused on speed and, y'know, revenue earlier than we did.)

jeffdavis · on Feb 15, 2018

VC-backed startups are not the entire business world. Some businesses don't consider correctness a "technicality".

user5994461 · on Feb 15, 2018

And those have been using Oracle, or Sybase or IBM for 3 decades.

kureikain · on Feb 16, 2018

MongoDB supports 1) 2) 6) 7). Not sure what 3 and 4 is, but you can just add a new node and new data will be copied over, no need to restore data from snapshot, but you can restore from snapshot too, it shorten the time the replice become available.

Not sure what you mean by 5) though.

Anyway, replciation is strong point of MongoDB with oplog and I don't think Postgres can beat it.

williamstein · on Feb 15, 2018

PostgreSQL as a company: https://www.citusdata.com

chipotle_coyote · on Feb 15, 2018

There are several companies that are leveraging PostgreSQL for their own businesses, but that doesn't seem to me to be a rebuttal of the OP's assertion that PostgreSQL couldn't survive as a company itself. Citus Data is not "PostgreSQL as a company," it is "a company that exists because PostgreSQL already existed."

qaq · on Feb 15, 2018

Well majority of key Postgres contributors work for 2ndQuadrant, EnterpriseDB, Crunchy Data, Citus etc. It basically means PostgreSQL is a distributed company and it would survive fine but being distributed it looks like it is able to innovate faster and is more resilient.

chipotle_coyote · on Feb 15, 2018

| It basically means PostgreSQL is a distributed company

I would argue that it means that different companies using PostgreSQL help fund PostgreSQL development. That's not the same thing as being a single company. It's a model which clearly works very well for PostgreSQL, but it doesn't really give us good data on whether the "single company doing closed source development" (e.g., Oracle) and "single company driving the bulk of open source development" (e.g., MongoDB) models would have worked as well for them.

munk-a · on Feb 15, 2018

There seems to be two points in this comment, one talking about the development of PostgreSQL and the other talking about the usability of it.

PostgreSQL remains one of the most mysteriously difficult common DBMSs to setup which is unfortunate, but since the advent of MongoDB they've adopted all the ease of use features that are warranted from it. Developing a quick-and-dirty product prototype on postgres is a breeze and bootstraping constraints and data-integrity to it afterwards is trivial. I am really not seeing any reason to start a new app on MongoDB exclusively at this point, start off in a strong DBMS like postgres and if you end up needing MongoDB-style document storage you can always branch to it later, but using it initially is a case of premature optimization, there is no need for it.

andyana · on Feb 15, 2018

I find it easier to setup than MySQL, but with package managers today, both are a breeze. Can you explain why it is "mysteriously difficult"?

nithril · on Feb 16, 2018

> MongoDB has identified a real pain point: many developers don't like to use SQL to interface with a transactional database.

The pain point relates less to SQL but more to the RDBMS and the rigid schema. SQL is spreading and may become a ubiquitous query language.

jmakeig · on Feb 16, 2018

The problem isn’t the schema, it’s that you must have exactly one at all times. Sometimes you need zero, sometimes you need many. Having a fixed schema in production reduces unpredictability and provides optimization opportunities. The journey to get to that fixed schema, however, generally benefits from more flexibility.

harryh · on Feb 15, 2018

It's linked in the RethinkDB essay, but it's always worth explicitly calling out the "Worse Is Better" essay:

http://dreamsongs.com/RiseOfWorseIsBetter.html

Ignore its lessons at your peril.

Your job isn't to build an engineering masterpiece. Your job, as pg says, is to build something people want.

btilly · on Feb 15, 2018

I agree with the lessons in Worse Is Better, but I don't think that the author properly understood what he was observing. The result was a confused and confusing essay.

The way that I understand it is that what is "Good" depends on how you measure it. When we measure in terms of technical quality, we get one answer. When we measure in terms of suited to be widely adopted, we get a different answer.

We tend to idealize for technical quality, but popularity is what matters more. And once something is widely enough adopted, the technical inferiority tends to be fixable.

pweissbrod · on Feb 15, 2018

Shipping early and working on stability later may work for something like a video game but not for a database my system depends on thanks

overcast · on Feb 16, 2018

I think many high profile games have over the last few years, have proven this does not work. Gamers are finally fighting back with their wallets.

aneutron · on Feb 16, 2018

If you're in nation-wide healthcare, that strategy quickly becomes unacceptable.

Analemma_ · on Feb 15, 2018

I was badly burned by Mongo hype back in the day, and as a result I won’t touch it with a 10-foot pole for the rest of my life, no matter how many times people say “No, really, it’s good now”. Falling for that was how I got into trouble in the first place. I know a lot of other devs like this.

If they can be successful despite us, more power to em I suppose. I’m a little annoyed that their path to success was built on the flaming wreckage of so many products that fell apart because of Mongo, by using us as their beta testers instead of building a non-shitty product, and I’m at least going to get this comment in so we aren’t completely forgotten among the congratulation.

andy_ppp · on Feb 15, 2018

I agree here, but I’d go further; build things people want, not the idealistic future some day version where we eventually get to a priority feature for a lot of people like releasing fast scalable software quickly (I’m not saying Rethink didn’t do this but they prioritised correctness and sharding, features fewer people need). For most apps built with Mongo this transaction support isn’t a problem (until it is).

nathan_long · on Feb 15, 2018

Both approaches have downsides.

The TCP/IP stack was built and used while the OSI model was being designed, and it won all the mindshare. Perhaps it would have been better to have separate presentation and session layers, but we don't; the application layer handles that stuff. It works well enough.

OTOH, this quote is wise:

> It is easier to optimize correct code than to correct optimized code (Bill Harlan)

I think this is doubly true for databases; at least with obfuscated code, you can recover the underlying meaning with work and exploration.

Losing or corrupting data is the worst thing a database can do. Given "this will be correct and hopefully we can scale it" vs "this will be fast and hopefully we can keep it correct", I'd choose the former for any "source of truth" data every time.

There are tricks for speeding up queries - indexes, cacheing (including materialized views), sharding, read replicas, etc.

There are no tricks for recovering data you lost.

andy_ppp · on Feb 15, 2018

> I think this is doubly true for databases; at least with obfuscated code, you can recover the underlying meaning with work and exploration.

True for databases, but not true for businesses.

> Losing or corrupting data is the worst thing a database can do.

Clearly people building simple crud websites with slick JS features didn’t agree otherwise Mongo would be gone and Rethink would be worth hundreds of millions of dollars.

crdoconnor · on Feb 15, 2018

>Clearly people building simple crud websites with slick JS features didn’t agree

I doubt it's that they didn't agree, it's more likely that the thought simply never occurred to them.

Mongo's marketing is directed with laser like focus on the beginner developer seeking out tutorials to build a website, etc. Questions about data consistency simply never arise in that context.

Later on that developer who was gently guided towards using mongo by all of the slick marketing will likely try to defend their decision when somebody attacks it ("their data consistency problems aren't that bad" or "data consistency isn't that important"), but that's something else.

gameswithgo · on Feb 15, 2018

>Clearly people building simple crud websites with slick JS features didn’t agree otherwise Mongo would be gone

Popularity is not always a good measure of what ideas are good ones.

makmanalp · on Feb 15, 2018

> build things people want, not the idealistic future some day version where we eventually get to a priority feature

FWIW, I don't think this was what happened. RethinkDB started out as an SSD optimized database, and quickly repositioned itself (due to "is this what people want") to something more generally useful, and was one of the most feature-rich databases at the time, I thought.

MongoDB however got first mover advantage and a bunch of cash that comes with it. They could afford to invest heavily in developer evangelism. Then they bought WiredTiger. If I sound bitter, I am a bit - not that Mongo did well in the end, but that RethinkDB went the way it did.

im_down_w_otp · on Feb 15, 2018

Clearly the evidence bears out the success of that strategy, but it's hard not to summarize it as, "Apparently a lot of developers, applications, and users don't need a database that works." But I don't know what that's really an indictment of exactly.

Perseids · on Feb 15, 2018

> I have to admit them a certain respect for executing their strategy so successfully.

But be certain not to conflate your respect as a business strategist with your judgment as a mindful developer. To speak clearly: By systematically playing a weak spot of ours [1] they have used countless of small teams as a stepping stone to sell their business contracts to large players, while hurting a lot of these small teams with an (at the time) inappropriate product for their needs. And through these huge costs they still made a product that is inferior to one that was designed properly.

As a community (both as a startup, as well as a developer community) we should resent these tactics and try to find ways to protect us against players that abuse the common good of mindshare. And lest you say, that is the price you have to pay to at all get a product like MongoDB in harsh business environments: We could also lobby for open source funds that are organized like research funds, producing fundamental technology that benefit everyone. Not every technology fits the model of for-profit startup innovation.

[1] Our community has very little defenses against marketing that comes from our midst, aiming to produce the (false) impression that a disproportionate amount of our fellows have evaluated the product and found it to be excellent. See https://www.nemil.com/mongo/3.html for a discussion about MongoDb specifically (HN thread: https://news.ycombinator.com/item?id=15124306 )

zappo2938 · on Feb 15, 2018

With all its problems, I built a MEAN (MongoDB, Express, Angular, Node) app from zero knowledge to production 2 years ago far faster than this React, Apollo, GraphQL, and Postgres app I'm building from zero knowledge.

gremlinsinc · on Feb 15, 2018

Speed isn't always a great thing... If it takes you 2x faster to build but 10x extra support/maintenance after the fact and eventually you need to migrate to postgres anyway because of acid features and stability.. then the time/money loss > benefits.

Build something the right way first, even if it does take longer though I use rbdms(mysql or postgres) all the time with an ORM and the ORM does most of the heavy lifting. (Laravel/Eloquent in my case).. so I still develop pretty rapidly. I'm sure if you use pg+react on multiple projects eventually the speed to launch will increase...

zappo2938 · on Feb 15, 2018

It is honestly a nightmare. I had to make a decision which framework to invest learning in with very limited funds. At the time, the big choices where Angular which had been established and backed by Google and React which was still very new with a much smaller community. I went with Angular and by the time I learned everything I needed, everyone wanted to hire React developers. Running out of money, I ended selling all my belongings, moving to a new city, and doing Backbone.js development. I've been working on learning React for the last several months not earning money and it is more difficult to learn than either Angular or Backbone because it isn't as opinionated driving me to have to learn each tool to decide which is best. My mind craves structure. Whereas most developers have 2 years React experience on me. I figure if I waited 6 months to learn JavaScript frameworks, React would have been the better choice and I would have been far better off than what happened. In a way the MEAN stack screwed me.

mooreds · on Feb 15, 2018

The flip side of 'hype first, features later' is that if you are a user who is burnt by mongodb (or another solution) you'll recommend against using it for a long time. So there's a knife edge to balance on--hype enough, but not so much that too many people get burned.

Gogogogirl · on Feb 15, 2018

Yes and Mongo helped us a lot to get thinks off the ground, but we're now moving everything to postgres with JSON for some time.

Fiahil · on Feb 15, 2018

While MongoDB might be a decent document store, I found that Elasticsearch is better at this job (as a secondary datastore). Its aggregation capabilities are juste far better than MongoDB's with the added bonus of being really good for all kind of searches.

nemild · on Feb 15, 2018

I also quoted Rethink's post in "The Marketing Behind MongoDB" in part 3 of my series on MongoDB:

> I sympathize with RethinkDB's team — they did what thoughtful engineers are trained to do. Engineering purity and humility is a tiny part of building a sustainable, venture-backed company.

https://www.nemil.com/mongo/

williamstein · on Feb 15, 2018

Despite their claims to the contrary, RethinkDB also released and claimed 'ready for production use' a version of the product that was pretty broken. I used it heavily and hit numerous serious bugs. The RethinkDB devs did do a very good job of tracking down and fixing them. Software is hard.

misterbowfinger · on Feb 15, 2018

From that post mortem:

It was unfathomable to us why people would choose a system that barely does the thing it’s supposed to do (store data), has a big kernel lock, throws away errors at random, implements single node features that stop working when you shard, has a barely working sharding system despite it being one of the core features of the product, provides essentially no correctness guarantees, and exposes a hodge-podge of interfaces that have no discernible consistency or unity of vision.

I mean... that's unfathomable to me too. He explains it later, that " MongoDB turned regular developers into heroes when people needed it"

I have a hard time understanding why devs choose / chose MongoDB. Postgres with JSON columns gets you so far, why would you go with MongoDB, given the issues it's had?

s_kilk · on Feb 15, 2018

> I have a hard time understanding why devs choose / chose MongoDB. Postgres with JSON columns gets you so far, why would you go with MongoDB, given the issues it's had?

Jsonb is a pretty recent addition to postgres, when compared with the MongoDB timeline. And even today postgres still doesn't have the replication/failover story that made MongoDB pretty compelling. I know, it's coming, whatever, but the point is that there was a time where if you wanted a json store that could stay alive through network issues, MongoDB was one of the only choices available, and postgres simply didn't have what was needed.

tensor · on Feb 15, 2018

The problem with that thinking is that the replication mattered. I'd argue it didn't, it was essentially a scam that people fell for. Who cares about failover when you're losing data due to a bad implementation? Who cares about replication when you can gain the same performance by using a performant database on a single node?

Did mongodb truly allow anyone to really horizontally scale? Most places that need massive horizontal scaling using something like mysql as far as I know.

imtringued · on Feb 15, 2018

The thing I don't understand about mongodb is that it makes a tradeoff for scalability.

The secret ingredient in the horizontal scaling sauce is giving up inter node ACID transactions.

Nothing prevents you from making the same tradeoff with mysql or postgresql.

slashdev · on Feb 16, 2018

Indeed, that's how youtube, twitter, and facebook use MySQL, among others.

overcast · on Feb 15, 2018

I loved working with RethinkDB, and the changefeed stuff was awesome. It gave me relational documents, which is all I wanted for most projects. Bummed that project has been basically slowed to nothing.

juancampa · on Feb 15, 2018

I'm still mourning RethinkDB. It's supposed to still be alive but the release cycle, or lack thereof, says otherwise.

boubiyeah · on Feb 15, 2018

Agreed. It was a complete piece of crap when it was around version 1.4-1.6 but it's pretty good now!

Lunatic666 · on Feb 15, 2018

A database which utterly fails the Jepsen test should not be considered for production. It might be good enough for a cache, but trusting it with real data is reckless.

slashdev · on Feb 16, 2018

They have since passed the Jepsen test to be fair. But before then I fully agree, why people trusted MongoDB with their critical data is beyond me.

lkrubner · on Feb 15, 2018

Most software developers have a negative impression of MongoDB, based on the many flaws that it had back in 2010. Among the people who did the best job of documenting those flaws, was Kyle Kingsbury, in his Jespen series:

https://aphyr.com/posts/284-jepsen-mongodb

But it is important to realize that the team at MongoDB has actually been working with Kingsbury, for several years now, and they have slowly and patiently fixed the problems he identified. Consider how the situation had evolved by 2017:

MongoDB 3.4 Passes Jepsen – The Industry’s Toughest Database Test

Jepsen Evaluation Demonstrates MongoDB Data Safety, Correctness & Consistency

On February 7th 2017, Kyle Kingsbury, creator of Jepsen, published the results of his tests against MongoDB 3.41. His conclusions:

"MongoDB has devoted significant resources to improved safety in the past two years, and much of that ground-work is paying off in 3.2 and 3.4. MongoDB 3.4.1 (and the current development release, 3.5.1) currently pass all MongoDB Jepsen tests….These results hold during general network partitions, and the isolated & clock-skewed primary scenario."

https://www.mongodb.com/mongodb-3.4-passes-jepsen-test

MongoDB has become an excellent document-store database. If you are still repeating FUD from 2010, then you are simply out of date. It's time to come up to speed on the reality of 2018.

nevi-me · on Feb 15, 2018

I'm one of those people who come in and comment about MongoDB, positively. My first "stack" was LAMP, in 2012 when I learnt how to use it, it was:

- a JSON store that works well with NodeJS - a geospatial database (yea, it has what I need) that was "easier" to work with - a database that made it easier for me to change my schema (if you just throw data at it, garbage in, garbage out).

Over the years, I've followed development, and adopted new features to make my life easier.

- I was one of the people who were excited about full GeoJSON support in the 2.* days, because that's something I depended on. - I've tailed the oplog for as long as I remember (never needed Redis), and have been learning about change streams (announced in 3.6) with the hope of submitting a PR to Apache Beam to support them. - I adopted a lot of the aggregation framework (Asya Kamsky from MongoDB personally helped me a lot) - The last time I migrated data from MongoDB was when I turned on WiredTiger in 3.0.*

My little replica has been up for as long as I remember, the only time I have downtime is when restarting my server.

When I start a new project these days, I still go to Mongo, because what held true in its early days still holds. It's quick to get something started. Yeah, there's $lookups, transactions in future; I've just leveraged features as they become available. I have Postgres in my stack, which runs TimescaleDB and is a backing store for my Gitlab instance.

Would I use MongoDB professionally? It depends on the use-case. Over the past 2 years I've worked with Oracle, SAP HANA, Teradata, Hive+Impala, etc. From OLTP to OLAP, but once in a while when I find it quick to, I still use MongoDB, and when I have it my way, I don't later migrate elsewhere.

tensor · on Feb 15, 2018

For good or bad, even though they've fixed these problems, I still don't have any trust in them as a company and will never recommend or look at their product. For me, their early days behaviour has defined their values as a company, and those are values I don't trust or agree with.

If they were willing to nearly scam people once, what is to stop them again in the future? Clearly their motive is money over quality and without a serious change of management I don't see why anyone should consider that they've changed.

gshulegaard · on Feb 15, 2018

Well...to the best of my knowledge the Mongo DB BI connector in 2018 is still Postgres:

https://www.linkedin.com/pulse/mongodb-32-now-powered-postgr...

Has this changed?

kstirman · on Feb 15, 2018

Yes this has changed. The BI Connector 2.0 is written in Go and uses the MySQL wire protocol: https://docs.mongodb.com/bi-connector/master/release-notes/

metheus · on Feb 15, 2018

John radically overstates his case in that article based on incomplete information; note that none of the code he points to is even 1.0 code.

Nonetheless, it’s accurate enough to say that the 1.0 BI connector used Postgres code. It was definitely the fastest way to deliver it.

2.x was built from scratch, it uses no Postgres whatsoever.

btown · on Feb 15, 2018

Lest we forget, we should wait for an Aphyr/Jepsen analysis before jumping to use this in production: it could go either the way of https://aphyr.com/posts/284-jepsen-mongodb or https://jepsen.io/analyses/mongodb-3-4-0-rc3

amelius · on Feb 15, 2018

Good point!

The code is almost completely new, so definitely a thorough testing is required.

overcast · on Feb 15, 2018

Next up will be SQL compliance, and we'll be back to a relational database. I'm curious as to what the impact to speed will be, and what the use cases for these types of databases is now that the major SQL players support JSON.

drmirror · on Feb 15, 2018

I beg to differ. (Disclosure: I work for MongoDB.) Using JSON as your data model, rather than relational tables, lets you build different applications that don't need multi-document transactions as often, because the data is already together in a single document. But when you do need multi-document transactions (a small percentage of applications do, and only few use cases inside those applications), they are now available. There is no speed impact on cases when you don't use them. And most of the time, you shouldn't use them, otherwise you wouldn't be capitalizing on the advantages of JSON. I think that's a game changer, but then again: I do work for MongoDB.

jeremiep · on Feb 15, 2018

Its usually only after a while you realize almost every piece of meaningful data is relational. It just didn't look that way when the project started. But now you're committed on the wrong database and its very costly to switch back to SQL.

Literally every project I saw using MongoDB ended up going back to SQL within the first 2 years after realizing the data is indeed very much relational and theres no clean way to model it using documents.

You always end up with either tons of duplication across documents, which is hell to maintain, or tons of multi-document queries with hacks to look ACID, which is also hell to maintain.

Sure Mongo makes it easy to prototype applications, but it makes it very complex to build robust and maintainable software. Its especially bad if you think your data isn't relational, because it almost certainly is.

Disclaimer: I believe Datomic to be the game changing database; because it values simplicity and composition and these attributes drive the entire design.

fnayr · on Feb 15, 2018

Video games storing player data are a great example of nonrelational data. I'm intending on writing a blog post after I finish my game detailing the structure of data I store and why it was so perfect for mongodb.

mustardo · on Feb 15, 2018

On the surface it sounds like you might have a case for Mongo, lookout for scenarios like...

* Trading in game items between two users (needs multi document atomic locks if you don't want duplicate or lost items) assuming your "schema" is a document per user

* You want to rename or restructure an attribute in the future, with no schema it's not possible change migrate data easily without writing ad hoc code (maybe you can use third party tools) or changing queries to expect data in multiple "schemas" which quickly gets painful

Good Luck!

mygo · on Feb 15, 2018

> You want to rename or restructure an attribute in the future, with no schema it's not possible change migrate data easily without writing ad hoc code (maybe you can use third party tools) or changing queries to expect data in multiple "schemas" which quickly gets painful

You can have schemas with MongoDB. There are various libraries to facilitate database design by schema specification.

Also renaming or restructuring your data is not necessarily an easy task with SQL. The nature of a database dictates that how good it works for your application depends on up to how well thought-out your schema is. Having to change your schema around is tasking. One of the reported advantages of document stores when they were becoming trendy was that it was easy to change your schema since your schema is essentially determined and regulated at the application layer.

Also MongoDB has ACIDic transactions now (freaking finally) so if it’s as-advertised then I feel like half of your argument is not really a strong one any more.

chris_wot · on Feb 16, 2018

If you need a schema, then go with an RDBMS.

fnayr · on Feb 15, 2018

Yes players can sell items to other players, so that's the one place so far I've needed to worry about atomicity, but even mongodb docs give examples with how to deal with something like that: https://docs.mongodb.com/manual/tutorial/perform-two-phase-c...

So yes, it's annoying for a very small % of what I'm doing, but 99% of my updates/writes are within a single document, so I find it very nice for development.

drmirror · on Feb 15, 2018

About the first use case, well it turns out MongoDB just introduced multi-document transactions today.

mustardo · on Feb 16, 2018

If only someone had posted that information to HN

jjmaestro · on Feb 15, 2018

Please do, because I'm interested in seeing how non-relational it is... I truly believe it has to be otherwise, so I'd love to read your findings!

imtringued · on Feb 15, 2018

You still need ACID transactions over multiple entries even if your data is nonrelational otherwise there is the potential for item and money duping bugs.

A simple example would be a marketplace.

Player buys item X with Y gold from another player.

1. Server checks that item X exists.

2. Server checks that the player has at least Y gold.

3. Server removes the gold from the player

4. Server gives gold to the seller.

5. Server removes item from marketplace.

6. Server adds item to inventory.

What if someone maliciously crafts two requests in a way that step 2 of the second request happens before step 3 of the first request? The money is deducted properly but the account can now have a negative balance and there are now two instances of the item.

nemo44x · on Feb 15, 2018

Most meaningful data is in fact not relational. When you consider machine generated data, logs, metrics, network event data and all other types of data like this you find there are not relations in it.

This data is meaningful because it allows to analyze what's going on over massive systems, detect when problems will happen, find bottle necks in applications and infrastructure, among many other use cases.

Application domain data tends to be relational I'd agree. But in general, this makes up a very small percentage of meaningful data in the world.

aeorgnoieang · on Feb 15, 2018

> When you consider machine generated data, logs, metrics, network event data and all other types of data like this you find there are not relations in it.

I find that hard to believe. Maybe not that the raw data isn't already relational, but that there are no relations real or implied.

If logs contain info about 'things' and any of those things can be considered to be the 'same thing' for multiple entries, then there's a relation right there – entry to thing.

And even metrics and network event data I'd expect to be full of cryptic IDs that reference some 'thing', i.e. a typical 'code' for which it's really nice to have a table with at least a friendly description.

Admittedly some of this data – or maybe even most of this data – isn't very 'deeply relational', but it definitely seems that claiming that "there are [no] relations in it" isn't strictly true.

nemo44x · on Feb 15, 2018

Well, it's all a point of view really. The data is in the form of an "event". An occurrence of a fact which occurred at a particular time and has data associated with it. So therefore, a "relation" as constructed in a relational database isn't appropriate. You aren't truly denormalizing the data when repeating ID's or tags or labels in this type of data. This is because, at that time that was in the fact the associated ID, tag, label, etc. It would make the stored event false if a field associated with it were to be changed as at the time of the event, that field did not have that value.

But it's all semantics really at that point.

Anyways, a relational database is a poor solution for this type of data. The stored data gains little to nothing, and may even negatively affect it's integrity (at time t, the event DID have this ID; it DID have this label), when stored relationally. Each event is discrete and there will be many of them which optimizes better for scale than relational organization.

I guess my point was there is vastly more useful data appropriate for a non-relational database than there is for relational databases. You might say it still has a "relation" in an abstract sense but this data does not need relational semantics within the database it resides in.

ZenoArrow · on Feb 15, 2018

> "I believe Datomic to be the game changing database"

Have you used it? If so, what was your use case?

woah · on Feb 15, 2018

As a developer: What? Almost everything is relational. I do appreciate Mongo's query language and ease of use (it was the first DB I learned), but your statement is ludicrous. Think about a basic blog system. You'll have relations between authors, posts, categories, and comments.

In my experience, Mongo is most often used with ORMs that emulate joins, like Mongoose. And the possibility of data inconsistency due to lack of transactions is ignored, or patched over with cleanup scripts after the fact.

mason55 · on Feb 15, 2018

> otherwise you wouldn't be capitalizing on the advantages of JSON

Shouldn't this be "otherwise you wouldn't be capitalizing on the advantages of a schemaless, document-oriented database"?

The fact that it's JSON (vs. some other format) seems immaterial

matwood · on Feb 15, 2018

As an aside, I hate the term 'schemaless' because there is always a schema.

https://blog.jooq.org/2014/10/20/stop-claiming-that-youre-us...

lukaseder · on Feb 16, 2018

Yep. Schema-on-read (E.g. MongoDB) or schema-on-write (E.g. Oracle)

metheus · on Feb 15, 2018

Quibble: “schemaless” isn’t accurate, MongoDB supports schema (JSON Schema, in fact). We often say “dynamic schema”.

But your main point is good—- JSON isn’t the essence of it.

And yet it’s part of it, in that you could store XML and still be a document DB; MongoDB definitely chose JSON as the medium deliberately.

ZenoArrow · on Feb 15, 2018

How does MongoDB handle schema changes? For example, let's say I want to add a mobile phone field to a customer record type. How would I go about doing that in MongoDB?

drmirror · on Feb 16, 2018

The short answer is: just do it. You can add any field to any document at any time. That's the beauty of JSON documents without schema constraints. Then of course you need to let your application understand that. But it turns out it's almost trivial to make an application display a phone number field if it finds one, and not display a phone number if there isn't one in the document.

mustardo · on Feb 16, 2018

This is true

It hurts when the next requirement comes along something like...

"As a user I want to have a home, work, and mobile phone number"

Now you have 3 "versions" of your implicit "schema" to contend with

1) No phoneNumber 2) phoneNumber and mapping it into / out of one of the three phone numbers in the UI 3) objects with three properties homePhoneNumber, workPhoneNumber, mobilePhoneNumber etc

Then the business comes up with "As a user I want to have arbitrary phone numbers that I can label" now the developers start to squeal

RDBMS + SQL is no panacea but having DDL operations like the following (all probably syntactically invalid but you get the idea) out of the box is incredibly powerful. ALTER TABLE user RENAME COLUMN phone_number TO home_phone_number; ALTER TABLE user ADD COLUMN work_phone VARCHAR(32) NOT NULL; CREATE TABLE phone_number (id BIGINT NOT NULL, user_id BIGINT NOT NULL, name VARCHAR(64) NOT NULL, phone_number VARCHAR(32) NOT NULL);

I have had reasonable success using MongoDB as a store of "things that happened" and will never change

drmirror · on Feb 16, 2018

And I would still claim that this is easier in MongoDB because several versions of the phone number field(s) can happily coexist in the same collection. Those variants are usually trivial to understand for someone who even just looks at the data, and the application can be written to either accept the different formats, or adjust the format on the fly when it encounters a document that still uses an old schema. Or you could indeed write a batch job that bumps all your phone numbers to a new format, and you could put a JSON schema constraint on your collection that enforces the new schema for every future document. All those possibilities exist, and I truly see that as a big advantage.

mustardo · on Feb 15, 2018

The onerous is mostly on the developer...

For any "real" system that is going to be in production for a long time this becomes a real problem

There are tools to "migrate" data but they come with all the limitations of the Mongo isolation model

Typically you either

* Write ad hoc (possibly using some tooling) code to iterate over your old data adding or mutating the field(s) in question

* Write queries such that they can handle the data being present, absent or in different forms for all of time. As you could expect this is a large burden

Waterluvian · on Feb 15, 2018

That's what I was wondering too. It feels like SQL and NoSQL will meet in the middle: JSON with SQL support, SQL with JSON support.

egeozcan · on Feb 15, 2018

"Mongres is a PostgreSQL extension that runs a custom background worker speaks mongo wire protocol."

https://github.com/umitanuki/mongres

btown · on Feb 15, 2018

There's also https://www.torodb.com/stampede/docs/1.0.0-beta3/relational-... which tails the Mongo oplog and makes a fully-relational read replica, adding columns and indices as needed. An amazing (and free) shortcut to using analytics tools if Mongo's your main datasource.

mcintyre1994 · on Feb 15, 2018

This is basically what Microsoft's Cosmos DB is, you store JSON documents, don't have to define a schema and you can query them with SQL.

malloryerik · on Feb 15, 2018

And you can also bake in graph relations, right? How has your experience been with CosmosDB?

mcintyre1994 · on Feb 15, 2018

To be honest I've only used it for pretty trivial things, I didn't do any joins. From the docs [0] it looks like they only have self joins, so joining on children or something like that but not across documents/collections.

Depending how much graph relation stuff you need you might be better off just using the graph API. I have no experience with that though. Or they support a MongoDB API if that covers your needs too.

Like I said I've only done basic stuff, but I really liked it - it's performant and really easy to set up and use. I used the Python API and it was really easy, then I switched to the Node one to try using it in Azure functions (Python library imports aren't really supported there) and that's nice too - it uses promises and works great. It also doesn't feel like a giant lockin (IMO) - their APIs work anywhere and there's no magic in Azure AFAIK to make you put your compute there if you're using the Database.

[0] https://docs.microsoft.com/en-us/azure/cosmos-db/sql-api-sql...

overcast · on Feb 15, 2018

Which is why I'm depressed RethinkDB failed to take off :(

sixdimensional · on Feb 15, 2018

Just remember, SQL is and always was "not only relational" - a play on the, should be long dead and buried acronym NoSQL (IMHO). Structured query language - I know for a fact from actually doing it (see Cache database, Hadoop schemaless SQL, even a product I worked on that was SQL to Mongo, heck the SQL standard itself) - guess what - it works with object/document stores too!

Mongo could have implemented SQL on top of their storage engine a long time ago minus the joins. Instead they built their own query mechanisms. Mind you, mapreduce can't be reproduced explicitly in SQL, but SQL expressions can compile to Mapreduce (see Apache Hive), so even that was not an excuse.

Edit: NoSQL served a purpose to remind people that there were other options other than relational databases (including those that predate the relational model and those that came after it), but man, what a terrible and misleading misnomer.

munk-a · on Feb 15, 2018

On this point, the datastores that have insisted on using a query language that isn't SQL have always seemed to just be attempts at setting up walled gardens. Plenty of people have extended SQL when their needs required it (MySQL, PostgreSQL, Oracle, TSQL/MSSQL) but intentionally rejecting the basic format of SQL queries just seems like an attempt to setup a barrier to ever move off your DBMS.

Speaking as someone who ported an application from MySQL to MSSQL there is still a lot of work required to remove those custom extensions, but the core of what you're doing can remain the same.

scarface74 · on Feb 15, 2018

I really don't need Sql compliance - I use C# and the Linq provider is pretty good.

alyson-cabral · on Feb 15, 2018

I'm the Product Manager on the Core Server responsible for the multi-document transactions project. For those of you interested in learning more about how we're building transactions in MongoDB, I suggest checking out this video that discusses creating WiredTiger timestamps to enforce correctness in operation ordering across the storage layer. The description is presented by Dr. Michael Cahill, the co-founder of the WiredTiger storage engine aquired by MongoDB. https://www.mongodb.com/presentations/wiredtiger-timestamps-...

jbellis · on Feb 16, 2018

Hi Alyson,

Thanks for the link! It looks like the video covers using timestamps to implement replicated mvcc correctly from primary to other replicas.

The interesting next step is how you go from there to multi-document across different primaries. Or is this limited to documents in a single shard?

alyson-cabral · on March 1, 2018

In 4.0, transactions will just be across replica-sets. The following release will have transactions across the entire sharded cluster (across multiple primaries).

jaequery · on Feb 15, 2018

Mongodb can be quite a nightmare once you start requiring anything more than a 1:1 relationship, which is pretty much any kind of app that is doing anything meaningful. Having to resort to doing things like map/reduce for a simple group by / order is not the way to go IMO. I think you later truly realize the beauty of SQL once you get far down that rabbit hole.

Initially I rode on the Mongo's NoSQL bandwagon when I saw that you can just save a JSON hash and thought that's the coolest thing in the world. But ever since I tried out Postgres's JSONB, I just can't go back to Mongo anymore. With Postgres, I have the best of both worlds, performance, relational data, and reliability. I don't have to sacrifice any of it. Also, I don't know who codes using raw SQL, it's been years languages have had ORMs that made queries look just like a Mongo query.

Also, for anything else, like a super simple requirement of saving data (JSON), Firebase have fit that role perfectly.

Mongo is starting to look like it's out of place in the eco system.

metheus · on Feb 15, 2018

> Having to resort to doing things like map/reduce for a simple group by / order is not the way to go IMO.

Perhaps you are unaware, but that is a straw-man. That’s not how you’re supposed to with MongoDB. You’d use the aggregation framework:

db.sales.aggregate( [ { $group : { _id : null, totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } }, averageQuantity: { $avg: "$quantity" }, count: { $sum: 1 } } } ] )

https://docs.mongodb.com/manual/reference/operator/aggregati...

This has been the case since MongoDB 2.0, in 2012.

spamizbad · on Feb 15, 2018

Aggregates carry with them a memory limit.

We currently use MongoDB 3.4. It's definitely much improved. The replication protocol that came along with 3.4 has been very reassuring. But aggregate queries really do not give you the power of SQL. They are great for transforming stuff for a report, but I would avoid using them for anything else unless 1) it can be cached (and therefor properly invalidated) and 2) doesn't need to be "real time"

Being a document, schema-less database, your "quality of life" as you scale with MongoDB is going to be heavily dependent on how you structure your documents and the types of their fields. Are you treating your collections like SQL tables? Have any many-to-many relationships in hot code paths? Welcome to hell. Its type system is also limited compared to modern SQL DBs. Storing IP addresses, and want to query them based on a given CIDR range? Postgres makes this eas, MongoDB has you writing code that does sub-queries.

metheus · on Feb 16, 2018

>Aggregates carry with them a memory limit.

I think you are referring to the 100MB RAM limit, but that’s not a hard limit, it’s more of a bad default. The `allowDiskUse` option lets MongoDB write intermediate results to the disk (which is exactly what SQL databases are doing).

> But aggregate queries really do not give you the power of SQL. They are great for transforming stuff for a report, but I would avoid using them for anything else unless 1) it can be cached (and therefor properly invalidated) and 2) doesn't need to be "real time"

I really don’t see a difference between what you can do with MongoDB and SQL. I can’t say much more without knowing specifically what impediments you have in mind, but I would certainly like to hear more. For example, why do you cite results caching and lack of real-time requirements?

> Being a document, schema-less database,

I guess if you’re on 3.4 you can’t take advantage of JSON Schema yet, but keep that in mind as a part of your upgrade plans. In the meanwhile you can still use document validation?

> your "quality of life" as you scale with MongoDB is going to be heavily dependent on how you structure your documents and the types of their fields.

This is completely true, but couldn’t we say that just as much about any database? I’d put money on there being way more grief out there over bad tabular schema than over bad document schema. I mean, who’s worked on large-scale systems that hasn’t put off implementing great ideas, or had to hack up app code to compensate for a restrictive schema, because you can’t take the pain of ALTER TABLE?

> Are you treating your collections like SQL tables?

Ouch. Please don't!

> Have any many-to-many relationships in hot code paths? Welcome to hell.

That's probably fair, but if you're in hell to a vastly greater than you would be with Postgres, I'm pretty sure that's a modeling problem. Again, can you tell me more about the particular example?

> Its type system is also limited compared to modern SQL DBs. Storing IP addresses, and want to query them based on a given CIDR range? Postgres makes this eas, MongoDB has you writing code that does sub-queries

That's 100% legit. MongoDB needs to do a ton better with that... types rule.

spamizbad · on Feb 16, 2018

> I think you are referring to the 100MB RAM limit, but that’s not a hard limit, it’s more of a bad default. The `allowDiskUse` option lets MongoDB write intermediate results to the disk (which is exactly what SQL databases are doing).

While technically true, this isn't an apples-to-apples comparison. MongoDB's memory limit, as best I can tell -- and please correct me if I'm wrong -- is based on the contents of all documents and operations in the pipeline. Documents in MongoDB tend to be larger, so you can run into that limit faster than one might anticipate.

In Postgres, you have the equivalent of "work_mem". It defaults to 4MB, but most production installation will bump this up. Regardless, this limit is per operation (a join, a sort, etc), not per query. And often times the operation is against a specific fields, as opposed to the entirety of the record contents.

> I really don’t see a difference between what you can do with MongoDB and SQL. I can’t say much more without knowing specifically what impediments you have in mind, but I would certainly like to hear more. For example, why do you cite results caching and lack of real-time requirements?

This might be my own hangups or just us fighting our own specific problems, but I've never been happy with the latency I see come out of the aggregate pipelines we've created. I also don't like what they do to the server's memory.

takeda · on Feb 16, 2018

> db.sales.aggregate( [ { $group : { _id : null, totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } }, averageQuantity: { $avg: "$quantity" }, count: { $sum: 1 } } } ] )

Now look at SQL version of it and tell me which one is easier:

SELECT SUM(price * quantity), AVG(quantity), COUNT(*) FROM sales;

sethgecko · on Feb 16, 2018

Am I the only one that finds it ridiculous that the way to query for data in SQL is by passing a string?

overcast · on Feb 16, 2018

"Also, I don't know who codes using raw SQL, it's been years languages have had ORMs that made queries look just like a Mongo query."

Baloney. Anything relatively complex will definitely require you to optimize in raw SQL. Abstracting all that out in an ORM is horrible.

nugator · on Feb 15, 2018

Do you have an ORM that can translate to good PG queries using JSON/JSONB functions? I would LOVE that in Java/on the JVM.

egeozcan · on Feb 15, 2018

> At its core, MongoDB is a document database and — almost by default — these kind of databases aren’t ACID compliant, especially when it comes to multi-document transactions. For the most part, that’s not a big deal for companies that use database systems like MongoDB because they are not trying to write to multiple documents at the same time.

No. At least in the open-source world, you can see many applications make multi-document "transactions". I don't see how it would be different in companies using MongoDB.

> Because of this, though, many MongoDB users still run relational databases in parallel with their document database.

No. For the most part, they do write to multiple documents and not think about consistency.

At least the situation seems to be getting remedied. Better late than never.

ralmidani · on Feb 15, 2018

Every time I hear about some NoSQL "breakthrough" that existed a while ago in SQL databases, I can't help but feel underwhelmed.

In general, I'm convinced SQL is like Constitutional Democracy; it's not perfect, but it's better than any alternative humans have come up with so far.

munk-a · on Feb 15, 2018

The consistency is really nice too. I've always hated the fact that column declarations are the first portion of a query and wanted them at the end but... I'd rather be slightly disappointed all the time than occasionally need to rewrite huge swathes of queries if we're changing DBMSs

mathieuuu · on Feb 15, 2018

We have been using mongo over the last three years on our project and it has been pretty smooth so far. However, lately I gave some thoughts into what our project would look like if we used PostgreSQL instead. I tried to figure out what problems Mongo solves that PostgreSQL doesn't.

I am far from being a database expert, I just know enough basics to query what I need, so feel free to correct/complete the following:

- Mongo has been built to store json objects -> Yes, but from what I understand benchmarks indicate that PostgreSQL is faster at reading/storing/indexing json/jsonb content. I don't think that it is good reason to use it.

- Mongo is schemaless -> There might be some usecases, but I bet in most cases this problem can be worked around. Especially in a database with JSONB support.

- MongoDB horizontal scaling is way easier than PostgreSQL. Yes, it seems that scaling horizontally Mongo is extremely easy compared to any other relational database.

And ... that's it. But there is probably more.

At the moment, here's how I would summarize MongoDB benefits if asked my opinion when starting a project:

- For a small projects or a prototype: ease of use, ease of configuration, don't require too much thinking into my data model while I am experimenting

- For a bigger project: horizontal scaling should be easier

Does that sound accurate to you? Am I missing anything important?

Pxtl · on Feb 15, 2018

Honestly, the existence of Mongo is mostly an indictment about how user-hostile conventional RDBMS is. The fact that other DBs can do what Mongo does is not helpful when there is no easy workflow to do what Mongo does.

The fact that I theoretically implement a web-based CMS in C and it could be more performant than all these web-language CMS products doesn't mean that C is better for making a CMS.

ZenoArrow · on Feb 15, 2018

> "Honestly, the existence of Mongo is mostly an indictment about how user-hostile conventional RDBMS is."

That's nonsense. I've taught people to use SQL before, even people with little to no programming experience. After the initial concepts were understood it was fairly easy to gradually expand knowledge over time.

The basic SQL keywords to start getting useful information out of a RDBMS are:

SELECT

FROM

WHERE

INNER JOIN

LEFT JOIN

ON

AS

AND

OR

Those 9 keywords give you a good starting point for exploring SQL. It's really not hard. You could probably learn enough to get started in a couple of hours, and it's easy to expand your knowledge as and when you need to once you've got the basics sorted.

Pxtl · on Feb 16, 2018

That's after you set up users, schemas, tables, columns, oddly-named datatypes with unexpected behaviors, etc.

The RDBMS is a lie. It's a beautiful kernel of relational theory wrapped in a 60-foot ball of hacks, tweaks, duct-tape, bubble-gum, and hate. The document store doesn't lie. It doesn't pretend. It's honest that it's stupid and it's a glorified hashtable with a string in it. It doesn't make any ridiculous pretenses of having a Sufficiently Smart Query Optimizer that will inevitably let you down and leave you pulling your hair out trying to figure out why on earth a simple, straightforward query is running so goddamned slow.

Then you build a complicated model and have to figure out from the query plan why your query is slow and deal with indices and foreign keys and all that nonsense.

Meanwhile an object DB may be inefficent and clumsy, but it gets all of that stuff out of the way. Also, if you don't want to join, you can work around that by duplicating the data all over the place. Something you can't do with an RDBMS because tables are fundamentally flat and so you can't stuff a parent-child relationship into a single table.

takeda · on Feb 16, 2018

> That's after you set up users, schemas, tables, columns

You have software engineer you similarly should have a data engineer (i.e. a DBA). Nearly 50 years has passed and we still didn't find a better way to represent data, so perhaps it is the right model. The only difficult part is to bother enough to learn it.

> oddly-named datatypes with unexpected behaviors [...] 60-foot ball of hacks, tweaks, duct-tape, bubble-gum, and hate

That's only when using MySQL

> Meanwhile an object DB may be inefficent and clumsy, but it gets all of that stuff out of the way. Also, if you don't want to join, you can work around that by duplicating the data all over the place. Something you can't do with an RDBMS because tables are fundamentally flat and so you can't stuff a parent-child relationship into a single table.

You absolutely can store data inefficiently in RDBMS for example in Postgres you can create a table with two columns one named key, and another data. The data type for the second column can be JSONB. End you're essentially have equivalent of Mongo's collection.

But in that case you store data inefficiently and if your application starts evolving and you need to make different queries things will get more complex quickly.

ZenoArrow · on Feb 16, 2018

> "That's after you set up users, schemas, tables, columns, oddly-named datatypes with unexpected behaviors, etc."

You shouldn't jump in at the deep end when learning this stuff, getting a feel for how an existing database works before creating your own is helpful, and you can download sample databases to play around with when you start learning if you don't have access to one already. Aside from this recommendation, if you're starting out learning about RDBMS, then SQlite is a good starting point, and the setup of a SQlite database is fairly simple.

If you're starting from scratch with a brand new database, here are some of the SQL keywords you'll find useful:

CREATE TABLE

ALTER TABLE

DELETE TABLE

TRUNCATE TABLE

INSERT INTO

UPDATE

SET

DELETE FROM

VARCHAR

INT

DECIMAL

PRIMARY KEY

FOREIGN KEY

REFERENCES

Furthermore, if you're using a decent DB GUI frontend, you don't even need to remember most of the above, aside from VARCHAR (for strings), INT (for whole numbers) and DECIMAL (for numbers with a fractional element). Reason being, you can do all of the database setup graphically. Tools like SQL Server Management Studio help in reducing friction.

vim_wannabe · on Feb 15, 2018

Mongo is truly the Javascript of databases. It's even made in JS!

mat_keep · on Feb 15, 2018

its mainly written in C++ and C

chasingthewind · on Feb 15, 2018

I literally just finished making some code changes minutes ago to very carefully sequence a set of changes to some related documents to make sure that if write failures occur they'll have the least impact on our system. I've been very happy with our decision to use MongoDB because in the vast majority of cases I just don't need transactions, but there's that one place where using them will be a big win.

krylon · on Feb 15, 2018

I looked into MongoDB a couple of years back, because it was the hot thing at the time. About fifteen minutes in, I try to find out how to do transactions. That is strange, I thought, the manual says nothing about transactions. I asked a popular search engine and was a little shocked to find out there were no transactions.

That was a dealbreaker for me. If MongoDB has now grown support for transactions, that changes things. I think I am going to look at it again sometime.

EDIT: Typo

metheus · on Feb 15, 2018

While doing so, I recommend you think some about the meaning of embedding relationships vs. referencing them in other collections. MongoDB can do both (the aggregation framework provides joins in the form of the $lookup stage).

If you have a “one-to-many(some)” kind of relationship, embedding is a good option, and gets you ACID semantics without even resorting to multi-statement transactions.

If you have a “one-to-many(thousands or more)” kind of relationship, objectID references (a la foreign key) is more likely what you want.

krylon · on Feb 17, 2018

It will be interesting to learn about database design in a non-relational environment. I guess you can replicate a relational structure in MongoDB, but then why not use an RDBMS in the first place?

As far as I understand it, MongoDB's claim to fame is a) handling huge amounts of data and b) clustering. Neither of these apply to me, so the only reason to use Mongo would be data that does not match the relational model well. I am still looking for a use case, any use case, that might make a valid excuse to learn it, but so far I have come up empty.

I am a little worried that I am facing a situation somewhat like the one when I tried to learn Lisp. Learning Lisp was very hard for me because I had absorbed the structured programming approach so deeply that the functional part of Lisp programming seemed downright alien to me. At first, at least. So for the time being, I cannot tell with any certainty if problems that are a good match for a document store are just so rare, or if I am just incapable of modeling my data in ways other than the relational model.

metheus · on Feb 18, 2018

Think about it less in terms of relational/non-relational and more in terms of tabular/document. Tables can only model things in terms of relations. Documents can model them either as embedding or as external references, depending on access patterns.

A decent example is a person record with their email addresses and phones. In a relational DB, you would always and only model those as three sepearate tables, and you would quite frequently need a three way join to assemble that person again.

In MongoDB, you would definitely have an array of emails and an array of phone numbers embedded in that person document, sparing the join on those queries. But in an ecommerce context, you would likely not embed an array of all past orders with all the line items into that person, instead you’d have an array of previous order numbers.

But an order document would have an embedded array of line items, sparing the DB a bunch more joins (and the attendant indices you’d need for joining line items to orders efficiently).

Getting the 10 most recent orders from a customer would involve joining the customers collection with the orders collection (MongoDB’s join is the `$lookup` aggregation stage), but it wouldn’t involve joining the line items to the order.

Is that replicating a relational structure? A little column A, a little column B.

callumjones · on Feb 15, 2018

I find it telling of MongoDB's sales strategy that this is being covered by Techcrunch. I don't think you'd get the same coverage on Postgres (granted MongoDB is a company, vs PG is a project).

majidazimi · on Feb 15, 2018

Call me when they added Triggers, Stored Procedures, Common Table Expressions, Window Functions,...

jasondc · on Feb 15, 2018

Triggers?? Where you add some application logic to the database, 10 years go by, and no one has any idea how the triggers work? Or even how to test them? I've never seen triggers used successfully in any production application (maybe they work at first, but give them time, and a few code changes).

ZenoArrow · on Feb 15, 2018

> "I've never seen triggers used successfully in any production application"

Triggers have one or two good use cases, and plenty of bad use cases.

I would suggest the strongest use case is for data validation. Databases don't have very sophisticated type systems, and custom data types can cause headaches. Database triggers allow you to ensure that the data stored in fields matches a set of criteria. To give a basic example, if you had a customer table with an email address field, you could validate that the email address had an @ symbol using a code in an insert and/or update trigger. Of course, you should have similar checks within the software that sits on top of a database, but by putting the validation at a low level by using database triggers you can be more confident that the data integrity will be protected.

As a second use case, if the triggers are being used to maintain data for audit or reporting purposes, that can be fine.

However, aside from the audit/reporting use case, I would recommend avoiding using triggers which span multiple tables (unless there are some exceptional circumstances where it's the best option). Things get messy when you have a chain of tables, each with their own triggers that can update other tables within that chain. If you spot lots of code like that, run to the hills!

majidazimi · on Feb 26, 2018

We are using it for building aggregate tables from raw table. We insert data into raw table. And triggers propagate running Min/Max/Avg for hourly/daily/monthly tables. After couple of months we truncate our raw tables but keep aggregate tables. Duplicates are easily removed based on insert. And we get instantaneous result on our aggregate tables (No hourly batch job)

colanderman · on Feb 15, 2018

Third case (and most important IMO): schema migrations. When doing major shuffling of a schema, you can use triggers to "redirect" accesses to the old tables or columns into new ones. This way, the schema updates can be applied without bringing down the application.

(Such triggers of course are removed once the application is updated, along with any old schema bits.)

segmondy · on Feb 15, 2018

You test triggers the same way you test any code.

You setup your preconditions, you execute any action, you assert your post condition. You can use something like http://pgtap.org/documentation.html to write your database tests.

Triggers have a place in app development, and sometimes are the best things to use.

wenc · on Feb 15, 2018

Triggers are pretty much like stored procedures. They have their place.

One of the major use cases of serverless functions is as triggers.

hunterjrj · on Feb 15, 2018

What you're describing is a fault in an organization's maturity with respect to documentation and NOT a fault of a database feature.

neovive · on Feb 15, 2018

Has anyone worked with MySQL JSON data types in production? For most projects, I prefer working in SQL via a query builder or ORM for abstraction, but find a few features that would benefit from denormalized JSON storage.

furicane · on Feb 15, 2018

I've been using MySQL 5.7 since its release in production. I used to store JSON data in a blob, but when they released JSON support - I just couldn't wait to use it. It's working really, really good. JSON support solves a ton of problems we used to have, and we used EAV model to tackle those problems (I won't get into details, the discussion will go the other way). I deal with a few hundred MySQL deployments ranging from 1GB dataset size to several terabytes, with multimaster and MySQL cluster setup to in-app handled sharding. There are several notable pieces of software that never caused any problems in our use scenario, and MySQL is one of them. The other is nginx. Can't really remember the third. I often wondered why anyone would use MongoDB or similar, but I always keep forgetting that developers are usually inexperienced hipsters who are looking for magic unicorn to solve their lack of knowledge / experience.