Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Gh-ost: GitHub's online schema migration tool for MySQL (githubengineering.com)
226 points by samlambert on Aug 1, 2016 | hide | past | favorite | 30 comments


This is neatly done. Thanks to Github for open sourcing this.


A very impressive (ab)use of MySQL binlogs.

There is a lot to criticize about MySQL, but the master/slave (binlog) architecture is incredibly flexible and with RBR it's even consistent ;)


So this is cool...

But if you aren't github-sized try pt-online-schema-change for any and all schema changes to production.

I suspect you'll love it as much as I do.


If you love pt-online-schema-change, you may want to check out Square's tooling around it: https://github.com/square/shift

https://corner.squareup.com/2016/04/shift.html

It provides a job running service, a UI for setting up and monitoring migrations, and an authorization/peer review flow for them.

Maybe it could add support for gh-ost in the future.


The README on github has a better explanation of how it works, and as I suspected, this could also be achieved with Tungsten Replicator.

Good job though. Anything that helps people move away from Percona and their "tooling" is a plus to me.


Why do you think people should move away from Percona?


Percona are flogging a dead horse called "Oracle" while at the same time basically being an Oracle subsidiary without being Oracle, but it's basically Oracle wrapped in scary perl scripts with backported features from better forks.

If you were to meet me in person I would just say "Fuck Percona/Oracle and their shitty shit".


The Percona Toolkit is a quite messy codebase and the tools offer an abdunant amount of features, but still they are useful.


does anyone know if this supports RBR with --binlog-row-image=minimal? I could only find mention of wanting RBR, no hints about whether it can handle minimal format


given that this needs access to the binary log, am I out of luck if I want to use this with amazon rds? (I don't see a way to see/access the binlog at rds)


You can definitely access the binary log on RDS: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_L...


thank you - still not sure if this tool could use it as it needs (perhaps 'streaming' the binlog?). I'm wrestling with making changes on a large RDS table, but I also don't actually have total permission on the account as it belongs to a client so I have to keep trying things, then seeing what breaks, then asking for a change, then trying again.


Apparently gh-ost won't work, but there are other tools that work fine on RDS: I've been using https://github.com/soundcloud/lhm On some fairly large tables (30 million + rows) without any problems


thanks - will look at this if I can't get the pt stuff working (already have used that in the past, so would rather keep new tools to minimum). This particular table is ... 200 million rows, give or take, and is a bugger to deal with).


I really want this to work on RDS. However, according to the docs[0]: Amazon RDS and Google Cloud SQL are probably not supported (due to SUPER requirement)

[0]: https://github.com/github/gh-ost/blob/master/doc/requirement...


Ah indeed it won't work on RDS as they explicitly test for SUPER or ALL privileges. But it's possible that the tool could be made to work without it in the future.


Pretty surprised that Github is not using Ruby for this project.


They have been using Go for a few other projects such as - git-lfs ( https://github.com/github/git-lfs ) - hub ( https://github.com/github/hub ) etc


My understanding is that github is a ruby front end / scripting shop who uses go on the backend for data manipulation and database work


So much work where there are many other solutions out there that don't have that limitation : PostgreSQL, several NoSQL databases...


While MySQL isn't the best db, and does have many flaws, it's asinine to say that other solutions are wholly better. Everything has a set of limitations, so just because something like Postgres doesn't have this specific limitation doesn't mean it doesn't have others. It also may be more worth it to work around the limitations of MySQL than to migrate all data to another database.


Exactly this.


Keep in mind that Postgresql wasn't as in vogue when Github was first built, and most or all of the NoSQL solutions you're thinking of didn't even exist. Up until 2 years ago, they were running Rails 2.3, and then is when they moved to Rails 3 (Rails 4 had been out a year at that point). I am very assured that Github is very careful about their technology choices and not quick to chase trends, given the importance of their application.


Really?

To me postgres and MySQL have been in a similar spot for 10 years at least. MySQL with a few more users but not massive.


Not by my recollection, and most ways of looking at trends seem to suggest the same: (unfortunately many don't go back that far, but let you see some trends)

http://www.indeed.com/jobtrends/q-postgresql-q-mysql.html

http://db-engines.com/en/ranking_trend

http://readwrite.com/2013/09/10/postresql-hits-93-new-levels...


I guess I forget that 70% of the web is PHP and 95% of PHP is MySQL, so I tend to not get a fair picture of life.

It appears that MySQL has held the 10x postgre spot for a long time, at least in search trends..


Literally right at this moment there's a post on the front page from Postgres detailing the valid technical reasons they lost Uber to MySQL.


And if you read carefully through their post, and the comments, and the commentary on pgsql-hackers, you'd see that: 1) it was an engineering decision based on their particular situation and use case (as it should be) 2) that use case may be pretty specific and/or unique, and not well suited for Postgres (which is fine) 3) they don't explain what all their tradeoffs are, just the ones they're making arguments against (which makes the post much less useful than it could be)

I would not take that blog post as a general "MySQL is better than Postgres" argument. It really needed more info on what they're doing, why they're doing it that way, and what tradeoffs they were willing to make (speed vs. data integrity, etc.).


Other systems with their own sets of limitations...


Many people are tied to MySQL for other reasons, it might be too costly investment to switch, etc...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: