There's a lot of anonymity going on here. A new HN account, an unknown company a...

angelbob · on Nov 6, 2011

And yet he makes some good points. Pretty much all of this is verifiable.

I don't agree with a lot of his conclusions, but mostly his data is correct.

rmc · on Nov 6, 2011

The original author should provide the verifiable evidence.

latch · on Nov 6, 2011

Look, I'm not the best person to do this..but...good points?

1 - Default writes are unsafe by default:

MongoDB supports a number of "write concerns":

* fire-and-forget or "unsafe"

* safe mode (only written to memory, but the data is checked for "correctness", like unique constraint violations)

* journal commit

* data-file commit

* replicate to N nodes

The last 4 can be mixed and matched. Most (all?) drivers allow this to be specified on a per-write basis. It's an incredible amount of flexibility. I don't know of any other store that lets you do that.

When a user registers, we do a journal commit ({j:true}), 'cuz you don't want to mess that up. When a user submits a score, we do a fire-and-forget, because, if we lose a few scores during the 100ms period between journal commit, it isn't the end of the world (for us, if it is for you, always use j:true)

The complaint is the default-behavior (which I think you can globally configure in most drivers) of the driver? Issue a pull request. Is the default table created in MySQL still MyISAM ?

2 and 6 - Lost Data

This is the most damning point. But what can I say? "No?" My word versus his? I haven't seen those issues in production, I hang out in their google groups and I don't recall seeing anyone bring that up - though I do tend to avoid anything complicated/serious and let the 10gens guys handle that. Maybe they did something wrong? Maybe they were running a development release? Maybe they did hit a really nasty MongoDB bug.

3 - Global Lock

MongoDB works best if your working set fits in memory. That should simply be an operation goal. Beyond that, three points. First, the global lock will yield, i believe (someone more informed can verify this). Second, the story gets better with every version and it's clearly high on 10gen's list.

Most importantly though, it's a constraint of the system. All systems have constraints. You need to test it out for your use-case. For a lot of people, the global lock isn't an issue, and MongoDB's performance tends to be higher than a lot of other systems. Yes it's a fact, but with respect to "don't use MongoDB", its FUD. It's an implementation detail, that you should be aware of, but it's the impact of that implementation details, if any, that we should be talking about.

3 and 4 - Sharding

Sharding is easy, rebalancing shards is hard. Sharding is something else which got better in 1.8 and 2.0, which the author thinks we ought to simply dismiss. I don't have enough experience with MongoDB shard management to comment more. I think the foursquare outage is somewhat relevant though (again, keeping in mind that things have improved a lot since then).

7 - "Things were shipped that should have never been shipped"

This is a good verifiable point? I remember using MySQL cluster when it first shipped. That was a disaster. I also remember using MySQL from a .NET project and opened up a good 3-4 separate bugs about concurrency issues where you could easily deadlock a thread trying to pull a connection from the connection pool.

I once had to use use clearcase. Talk about something that shouldn't have shipped.

This is essentially an attack on 10gen, that ISN'T verifiable. Again, it's his anonymous word versus no ones. Just talking about it is giving it unjust attention.

8 - Replication

It's unclear if this is replica sets or the older master-slave replication. Either way, again, I don't think this is verifiable. In fact, I can say that, relatively speaking, I see very few replica set questions in the groups. It works for me, but I have a very small data set, my data pieces themselves are small. Obviously some people are managing just fine (I'm not going to go through their who's who, I think we all know some of the big MongoDB installations).

9 - The "real" problem

We've all seen some pretty horrible things. I was using MySQL in 5.0 and there was some amazing bugs. There's a bug, which I think still exists, where SQL Server can return you the incorrect inserted id (no, not using @@identify, using scope_identity) when you use a multi-core system. MS spent years trying to fix it.

I guess I can say what 10gen never could...If you were using MongoDB prior to 1.8 on a single server, it's your own fault if you lost data. To me, replication as a means to provide durability never seemed crazy. It just means that you have to understand what's going on.

Look, I don't doubt that this guy really ran into problems. I just think they have a large data set with a heavy workload, they thought MongoDB was a silver bullet, and rather than being accountable for not doing proper testing, they want to try and burn 10gen.

They didn't act responsibly, and now they aren't being accountable.

angelbob · on Nov 6, 2011

If you were using MongoDB prior to 1.8 on a single server, it's your own fault if you lost data. To me, replication as a means to provide durability never seemed crazy. It just means that you have to understand what's going on.

Well, except for that thing where the replication decided that the empty set was the most recent and blew everything else away. And those cases where keys went away.

Losing data, particularly when the server goes down, is fine. Even not writing data isn't terrible, though his points about not knowing whether it has been written in case of failure are really good ones. But corrupting data and then replicating that corrupted data is really, really bad. Often unfixably bad.

They didn't act responsibly, and now they aren't being accountable.

For the complaints about the default write stuff, sure. For everything else... Dunno. He brought up a lot of real, actual issues which were not documented MongoDB behavior. Yes, there's also a fair bit of complaining about the documented bits, and sure, boo-hoo, whatever. But the idea that 10gen is shipping stuff with serious data integrity bugs, and doing so knowing, doesn't seem out of line here.

And while MySQL also has some bad stuff, sure, it has nothing like as many data integrity bugs as MongoDB.

And I say all of this as a serious fan of MongoDB.

einhverfr · on Nov 6, 2011

"This is a good verifiable point? I remember using MySQL cluster when it first shipped. That was a disaster. I also remember using MySQL from a .NET project and opened up a good 3-4 separate bugs about concurrency issues where you could easily deadlock a thread trying to pull a connection from the connection pool."

You can STILL deadlock a transaction against itself in MySQL w/Innodb. How do they let this happen? I do not know. I just know I have been bitten by deadlocks in multi-row inserts quite often there enough to get really really frustrated when I use that db. This is in fact documented in the MySQL manual.

For better or worse, projects which start out without a goal to offer highly reliable software from the start never seem to be able to offer it later.

latch · on Nov 6, 2011

I've also seen a lot of SQL Server developers write large stored procedures that manage to easily deadlock. It's been years since I dealt with it...had something to do with lock escalation, from a read lock to an update lock to an insert lock.

You could say "don't use SQL Server"..or you could say "it's important that you understand SQL Server's locking behavior"

einhverfr · on Nov 6, 2011

It's one thing for two transactions to deadlock against eachother. It takes special talent to allow a transaction to deadlock against itself, which InnoDB apparently allows.

I have NEVER had issues with PostgreSQL transactions deadlocking against themselves, even with monstrous stored procedures.

regularfry · on Nov 6, 2011

I honestly have no dog in this race, but an argument which boils down to "MySQL is just as bad" is not one I'd choose to pursue.

latch · on Nov 6, 2011

I spent the time to write all that, and all you got from it is "MySQL is just as bad"...I obviously did a bad job.

edit:

I brought up MySQL because I think we all know that companies, you, me knowingly ship products with bug. In fact, you can look at public bug tracking for a bunch of major software and see bug fixes scheduled for future releases.

However, if you are going to accuse a database vendor of knowingly shipping data-corruption bugs, I think you absolutely have to back that up. It's slanderous. Obviously, if you think that, you also shouldn't use their product. But you either know something the rest of us don't, or you're a complete ass, if you make those kinds of statements without evidence.

regularfry · on Nov 6, 2011

No, of course that's not all I got from it. I was making a point specifically about the comparison you seemed to be making: that because MySQL did something (shipping with stupid defaults, dataloss bugs, whatever), it doesn't count as a black mark against MongoDB if they do the same.

I didn't comment on the rest because I don't care, not because I don't get it.