Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hadoop is insane. The elephant is fitting. Is it really the best choice, or has someone done something cleaner in golang or c++11?


> Is it really the best choice, or has someone done something cleaner in golang or c++11?

What does the language have to do with the program?

Hadoop is what it is because it's a complex problem with a fittingly complex solution. Simply re-writing it in your pet language won't somehow make it "better".


Go and modern C++ are both quite a bit more terse than Java. They also produce binaries which don't necessarily require a runtime to be available on every server (just ABI compatibility).

(I have no horse in this race, I am just writing what I think the grandparent comment was referring to)


> They also produce binaries which don't necessarily require a runtime to be available on every server

Just like Java[0]. It is just a matter of choosing the right compiler for the use case at hand.

[0] - http://www.excelsiorjet.com/ (one from many vendors)


Cool concept, I didn't realise this existed. Can you run Hadoop and friends under this? I've worked at companies with over 500 servers in a Hadoop cluster and literally never once heard about anything other than using Oracle's JRE aside from one proposal to use OpenJDK which was shot down pretty quickly.


I don't have experience with Hadoop.

Almost all commercial JVMs have some form of AOT or JIT caching, specially those that target embedded systems.

Sun never added support to the reference JVM for political reasons, as they would rather push for plain JIT.

Oracle is now finally thinking about adding support for it, with no official statement if it will make it into 9 or later.

JEP 197 is the start of those changes, http://openjdk.java.net/jeps/197

Oracle Labs also has SubstrateVM, which is an AOT compiler built with Graal and Truffle.


Way back in the day, GCC's gcj compiler would do AOT compilation of Java, however I believe it stopped being developed at jdk5 support.


If I am not mistaken most the developers abandoned the project to work on the Eclipse compiler and OpenJDK when those projects became available.

GCC only keeps gcj around due to its unit tests.


There's also things like exec4j which bundles everything including a JVM into an executable which one can just run... and things like AdvancedInstaller and Install4j will also allow one to bundle a JVM.

So producing a binary which doesn't require a separate runtime really isn't a problem.


Since you mention it, Java 8 brings bundling and installers support into the reference JDK.


C++ does usually require a runtime.


C++'s runtime is small and ubiquitous. Depending on how the software is written (if it allows disabling exceptions and rtti), it might be the same size as C's runtime, which is practically (but not totally) nonexistant.

I'm not an expert on Java, but my experience with it is that it's runtime is fairly huge and requires custom installation.


C++'s runtime is worse than Java's in that sense. Most JVMs can run most Java bytecode, but your libstdc++ has to be from the same version of the same compiler that your application was compiled with.


It was quite surprising for me the first time I did a little embedded work and discovered I couldn't run binaries that were compiled against glibc on my musl-libc based system, and vice-versa. I had initially thought they all just supported the same c89 spec so should work...


Yep. It's 99% ABI compatible, but that 1% will kill you.

For that matter, as you allude even C has a runtime.


Which C++ runtime is ubiquitous? I can think of at least 3 C++ runtimes (MS, libstdc++, libc++).


I spent an entire day last week attempting to build hadoop with LZO compression support. There are many outdated guides on the internet about how to do this, and I eventually gave up and spent a few hours getting the cloudera packages to install in a Dockerfile so I could reproduce my work later.

Figuring out which software packages I needed, how to modify my environment variables, which compiler to get, and where to put everything in the correct directory was the entire difficulty.

If it were written in Go instead of Java, I could have done `go get apache.org/hadoop` and it would have been done instead of giving up after hours of frustration.

Go has almost no new features that make it an interesting language from a programming language perspective. Go's win is that it makes the actual running of real software in production better. Hadoop's difficult is exactly why InfluxDB exists at all.


> If it were written in Go instead of Java, I could have done `go get apache.org/hadoop`

This complaint is just about packaging, and not the language itself. Any project can have good or back packing scripts, and for Java there are plenty of ways to make it "good".

Not to mention, the BUILDING.txt document clearly states they use maven[1] and to build you just do: mvn compile

> Go's win is that it makes the actual running of real software in production better

This might just be a familiarity issue, because once you launch the program, all things are equal.

And yes, you can bundle a JVM with your java app, which makes it exactly like GO's statically linked runtime and just as portable without any fuss.

[1] https://github.com/apache/hadoop/blob/trunk/BUILDING.txt


> no new features

Go gets us better performance and concurrency out of the box.


> Go gets us better performance

Than Java? At best, GO performs on par with Java, but is often measured 10-20% slower.[1][2][3]

This is usually attributed to the far more mature optimizing compiler in the JVM, which ultimately compiles bytecode down to native machine code, especially for hot paths. Java performance for long running applications is on par with C (one of the reasons it's a primary choice for very high performing applications such as HFT, Stock Exchanges, Banking, etc).

> concurrency out of the box.

Java absolutely supports concurrency "out of the box"...[4]

[1] http://zhen.org/blog/go-vs-java-decoding-billions-of-integer...

[2] http://stackoverflow.com/questions/20875341/why-golang-is-sl...

[3] http://www.reddit.com/r/golang/comments/2r1ybd/speed_of_go_c...

[4] http://docs.oracle.com/javase/7/docs/api/java/util/concurren...


Hell, if we look at real-world-ish applications, the techempower benchmarks show go at easily 50% slower than a bunch of different Java options.


>What does the language have to do with the program?

I happen to agree with you whole heartedly, if you spend enough time here though you'll see the inevitable comment about how anything made in php is worthless insecure garbage and anyone who spends their time developing a php application are amateurs at best.

This isn't really a comment at you, just wanting to point out how much that convention is challenged.


http://www.pachyderm.io is modern alternative.


Apache Spark is a good replacement for Hadoop now. It's written in Scala.


Spark is a good replacement for MapReduce. MapReduce != Hadoop.


Fair enough, but the original article was about Hadoop MapReduce wasn't it? It specifically says:

"without even using any of the HBaseGiraphFlumeCrunchPigHiveMahoutSolrSparkElasticsearch (or any other of the Apache chaos) mess yet."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: