MySQL 5.7: over 1M queries/sec on a single machine

eonil · on Nov 23, 2013

Read-only performance for unknown kind of query with unknown size of dataset using memory-cache plugin.

Hmmm… I don't see any meaningful information. Should I be impressed? Am I missing something?

arekanderu · on Nov 23, 2013

I think i agree with you.

eonil · on Nov 23, 2013

Thanks for explicit agreement. It's hard to keep my opinion because I am usually minority.

epo · on Nov 24, 2013

No, you shouldn't be impressed. It only shows that some cacheing scheme can deliver data this quickly. One byte reads possibly, perhaps the same byte read one million times. Almost nothing to do with MySQL performance in fact.

corresation · on Nov 23, 2013

Yes, you should be impressed.

Historically the overhead of SQL put a low ceiling on the potential performance of simple queries. Which is exactly why KV databases appeared in the first place, because a scalar value get in most RDBMS systems barely hit 1000 results per second, and that is being optimistic. That overhead was never optimized away because the classic use of the RDBMS was query conversations where each interaction was very expensive (e.g. large, complex queries), so optimizing it wasn't important.

But, thanks to competition, most database systems have gotten fantastically more performant. Even using the TDS interface, I can get close to Redis performance with SQL Server now, while it was beaten by one if not two orders of magnitude a few versions ago. And in this case MySQL added the ability to use a much simpler, lighter memcached API in from of MySQL, for obvious reason -- the underlying database system is fast enough that it is of value.

I should add that almost everyone who has commented thus far completely misunderstands what this is taking about. It is the Memcached API atop MySQL. Meaning you can talk to your database server through that API from any client programmed to talk to a memcached server. And that same data that you access via memcached (reading or writing) can be accessed in your normal SQL. Which is pretty cool.

eonil · on Nov 23, 2013

When I see explicit mention of API, it implies there's no running memcache daemon.

I dug MySQL 5.7 manual several times, but I couldn't find any proof that this is just an memcache API atop MySQL.

But I found some proof that says there's a real memcache daemon atop MySQL.

> Running memcached in the same process space as the MySQL server avoids the network overhead of passing requests back and forth.

MySQL 5.7 manual clarifies that they're running memcache daemon inside the database engine in-process mode. So actually real memcache is serving the data atop the engine.

So, if dataset is read-only and small enough, it's likely to happen that whole the data is being served by memcache. In that case, it cannot be slower than memcache.

That's why I couldn't be impressed. The only thing MySQL doing is automatic handling of memcache daemon. That should be convenient, but not impressive.

Maybe I am wrong. If you think I am wrong, please show me some reference that explicitly mentions this is just API, and there's no running memcache instance.

Also, regardless of whatever is really happening under hood, this benchmark is still meaningless. Because database exists to store data and all the hard stuffs are only happening when storing data. Writing transactions usually degrade performance a lot. That's why writing transactions must be included in benchmark to show the real world performance.

corresation · on Nov 23, 2013

>Maybe I am wrong. If you think I am wrong, please show me some reference that explicitly mentions this is just API, and there's no running memcache instance.

https://dev.mysql.com/doc/refman/5.6/en/innodb-memcached-ben...

Yes, there is a memcached "daemon" in the sense that they run a memcached host within mysql, which they obviously have to do. That memcached "instance" reads and writes from InnoDB and the shared buffer pool also used by the SQL engine: It does not maintain a unique memory instance, and values read or written are atomically consistent with the mySQL instance.

The single and only reason this product exists is that the memcached API is much simpler and network/parsing efficient, allowing for much better round-trip-times when doing simple queries.

eonil · on Nov 24, 2013

OK. You're right. MySQL seems using InnoDB buffer pool. I found the mention.

> You do not need to manually load data into memcached at startup. As particular keys are requested by an application, the values are retrieved from the database automatically, and cached in memory using the InnoDB buffer pool.

So memory-cache is provided the engine itself, not by external plugin.

jacabado · on Nov 23, 2013

Could you elaborate on SQL Server vs Redis claim? Or throw some references?

I am not doubting but just honestly asking. I've been discussing a possible Redis solution for complementing our current SQL Server.

Our problem isn't that big, we need to serve 10k req/sec of a simple query, can SQL Server do that? Isn't the connection pool the bottleneck to handle loads of this scale?

corresation · on Nov 23, 2013

we need to serve 10k req/sec of a simple query, can SQL Server do that

Several years ago with SQL Server 2008 R2 I achieved 200,000+ simple queries per second (http://bit.ly/IlH2id -- this is not a regimented benchmark by any measure of the imagination, but is only saying "validate before assuming" because you might find your install performs far better than you anticipate) using the standard TDS query interface. This was on pretty beefy hardware, and is obviously enormously contingent on the data being in memory (which you can force with 2012), however it blew me away and completely undermined an initiative we had to implement AppFabric / Redis or other solutions.

kermatt · on Nov 23, 2013

> contingent on the data being in memory (which you can force with 2012)

What is the mechanism to do this in MSSQL 2012?

jacabado · on Nov 23, 2013

I haven't seen it on the new features of 2012, but I have seen it on some demos of 2014 Beta: http://stevenormrod.com/2013/09/an-overview-of-sql-server-20...

corresation · on Nov 23, 2013

Whoops, I meant to say 2014.

kermatt · on Nov 25, 2013

I did not spot the mistake, simply assuming it was MSSQL version-after-2008.

Is Hekaton the feature you are referring to?

jeffdavis · on Nov 23, 2013

For comparison, here's a scalability benchmark for postgres (also read-only) from a year and a half ago:

http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-abo...

It's hard for me to tell what the MySQL benchmark was really doing, but the postgres one seems a bit more "normal" (ordinary SQL queries from postgres's ordinary simple benchmark tool).

So, not apples-to-apples, but still interesting.

jeltz · on Nov 24, 2013

I would have loved if they had benchmarked it against doing them same thing in MySQL with SQL so we could see how much overhead is added by the SQL layer for simple selects.

PostgreSQL manages to reach a third of their QPS in Haas's benchmark.

lelf · on Nov 23, 2013

InnoDB API on top of memcached. Read-only. On the same “biggest HW box we have in the LAB” machine.

over 1,000,000 Query/sec

Cool. Carry on guys.

cpleppert · on Nov 23, 2013

Huh? First off, the memcached api is talking to InnoDB directly not the other way around. This means that you can dramatically reduce the latency of queries that would previously require you to maintain a bunch of cache and the complexity that comes with it and you would incur the overhead of going to both the memcached server and the database on a cache miss(For complicated queries the you might have to continually hit many cache servers and then the database multiple times, this means only a single connection is needed even in the worsst case that the data isn't in RAM). This also has the advantage that a cache miss doesn't force the SQL layer to come into play just to fetch an item from disk and then have the application stick it into the cache(so your cache is just the page cache in InnoDB; this is much more efficient than application managed memcached).

So yes, this is very cool.

matthewmacleod · on Nov 23, 2013

To be clear, it's the memcached API on top of InnoDB - not the other way around. This is cool because it allows fast access to data in InnoDB, while exposing the same underlying data through SQL, as well as for a bunch of other reasons.

continuations · on Nov 23, 2013

> InnoDB API on top of memcached

It's actually the other way round: memcached API on top of InnoDB storage.

martinml · on Nov 23, 2013

Well, the performance under the same conditions was 500KQPS before, so definitely there was an improvement!

hahainternet · on Nov 23, 2013

Right but if someone showed a similar improvement by adding enough RAM to prevent swapping, would you be impressed? This is what is going on here. They have very much successfully proven that RAM is faster than HDD.

nailk · on Nov 23, 2013

They have proven that SQL-based storage is faster than NOSQL databases.

The poster of the top comment doesn't understand what this topic is about.

This is actually a benchmark of InnoDB table access. You can query it using usual SQL-commands. But if you want faster access, you can use memcache API.

"Read-only" is also misleading here. This is just a benchmark of get command.

Xorlev · on Nov 23, 2013

"They have proven that SQL-based storage is faster than NOSQL databases."

Too broad of a statement. 'NoSQL' is an umbrella term, and thus likely has several contenders for a raw QPS metric.

The additional step of parsing the query is obviated by statement caching. You're right though, it's simply a benchmark of how fast the query path is to the InnoDB buffer pool. The work is to optimize internal contention.

Very impressive results though.

martinml · on Nov 23, 2013

I thought it was a benchmark of the same hardware with 5.6 vs 5.7. I see now that they changed more than that.

morgo · on Nov 23, 2013

For context, the first NoSQL interface for MySQL was: http://yoshinorimatsunobu.blogspot.ca/2010/10/using-mysql-as...

(It's appeared on HN a few times).

The MySQL team decided to run with a more standard memcached interface, and MySQL actually embeds memcached to do this (with MySQL as backend storage).

Despite being read-only, the over 1M number highlights a lot work thats been done for multi-core scalability. This test was on a 48 score machine.. that was a distant dream a few years ago on MySQL 5.1.

chris_wot · on Nov 23, 2013

A benchmark is of absolutely no use whatsoever unless you give others a chance to reproduce it.

buremba · on Nov 23, 2013

1M queries doesn't make sense if most of them actually use Memcached as backend. You can run more than 1M queries like "select 1", it does the same thing. The magic is Memcached.

baq · on Nov 23, 2013

true but offtopic. from the docs:

==== 8< ====

MySQL 5.7 includes a NoSQL interface, using an integrated memcached daemon that can automatically store data and retrieve it from InnoDB tables, turning the MySQL server into a fast “key-value store” for single-row insert, update, or delete operations. You can still also access the same tables through SQL for convenience, complex queries, bulk operations, application compatibility, and other strengths of traditional database software.

With this NoSQL interface, you use the familiar memcached API to speed up database operations, letting InnoDB handle memory caching using its buffer pool mechanism. Data modified through memcached operations such as ADD, SET, INCR are stored to disk, using the familiar InnoDB mechanisms such as change buffering, the doublewrite buffer, and crash recovery.

==== 8< ====

this is what is being benchmarked here, not pure memcached.

NickNameNick · on Nov 24, 2013

Are these in memory innoDB tables durable?

jeltz · on Nov 24, 2013

As far as I can tell this is just a a memcached interface to ordinary InnoDB tables, so yes. 1000k queries per second also sounds plausible since stock PostgreSQL can reach almost 400k queries per second with normal SQL queries.

The storage engines of relational databases are this fast (for read-only queries to datasets which fit in memory).

buremba · on Nov 23, 2013

Yes but this doesn't change the fact that the queries use Memcached as the backend. If all of 1M queries hit the InnoDB tables, then I would be impressed. Also note that it's read-only performance, I think MySQL acts like a bridge here for most of the queries. (Correct me if I'm wrong.)

cpleppert · on Nov 23, 2013

>> If all of 1M queries hit the InnoDB tables, then I would be impressed.

And you should be because that is exactly what is happening :)

CSDude · on Nov 23, 2013

1M queries of what? Group by, 50 table joins or simple select count(*)?

riffraff · on Nov 23, 2013

this is the memcache compatible interface, so they should be

`select y where key = x limit 1`

CSDude · on Nov 23, 2013

So memcache should be faster than 1M queries, since only thing that MySQL does here to add overhead?

cpleppert · on Nov 23, 2013

Not necessarily, InnoDB has optimizations for in-memory tables and there isn't any write overhead that would make the greater complexity of the transaction/disk layer come into play.

wcummings · on Nov 24, 2013

This benchmark is read-only so write-overhead doesn't matter

Mahn · on Nov 23, 2013

> Read-Only

> Memcached

Well duh.

jeltz · on Nov 24, 2013

It is read-only, InnoDB, and memcached. InnoDB is the storage engine used here for memcached.

JohnTHaller · on Nov 23, 2013

But is it web scale?

wooptoo · on Nov 23, 2013

No, everybody knows Mongo is web scale.

jwheeler79 · on Nov 23, 2013

HN is becoming more like reddit everyday. Ugh

SkyMarshal · on Nov 23, 2013

Just downvote these and move on.

eonil · on Nov 23, 2013

Everybody also knows Mongo has no transaction. Not a comparable product. Ah maybe comparable in this case, because this is read-only test.

stonewhite · on Nov 23, 2013

Try TokuMX. It mongodb with better engineering decisions and has ACID transactions, mvcc, no collection locks, no index fragmentation.

MichaelGG · on Nov 23, 2013

Transactions apply to read-only workloads. Getting a consistent read across multiple records can be very important.

eonil · on Nov 23, 2013

That's only true when we have any write operation. If all dataset are purely read-only, it's always consistent. And I believe this article is the case.

shocks · on Nov 23, 2013

<sarcasm> Pfft, GenericNoSQLDB has always been fast and totally mature and production ready, honest. </sarcasm>

smoyer · on Nov 23, 2013

I think I can safely claim that (with the right team) I can scale any database to match that rate. The real problem with scaling is that until you hit a limit, you don't generally know where the limit is ... and at the limit, you need the system to do something reasonable (in this case, I'd say drop read and/or write requests that couldn't be done consistently). I also wonder if you couldn't perform the same feat with only memcached (wink wink).

Systems that behave erratically when you hit a limit should be avoided, and I often wonder if people publish articles like this because they themselves are amazed it was possible. For a well-behaved system, there's no need for fanfare as you grow, but when you hit a limit you have to engineer a reasonable solution.

As an aside, it's pretty amazing (to some who's first computer had 2K of RAM) how big you can scale an individual server these days. The competition I'd like to see is when another team accomplishes the same feat but with half the memory, or half the number of cores or a result set that's twice as large - and then publishes how they've done it. A little friendly competition might result in some amazing discoveries.