fyi: re: "I don't really know if the medical field has any use for text mining, ...

fyi: re: "I don't really know if the medical field has any use for text mining, graph processing and the like."

It does. Although the graph processing use cases that I've seen recently (and their developers) are better served by a faster query engine than e.g. Spark has provided.

Most people, if they can get away with Excel, won't use an RDBMS. (Of course this means that eventually someone will have to come along and scrape all the damned spreadsheets into an RDBMS, but...)

Most people, if they can get away with an RDBMS, won't use Hadoop. (Of course, sometimes you end up with something like DB2 file pointers, which really just say "hey look here's a pile of unstructured data that we couldn't figure out how to handle, and this is where we left it", and then of course someone has to put it somewhere useful, but...)

Now if you're moving around copies of the Internet, or trillion-row "databases" that need nearly instantaneous OLAP, then yeah, you'll be needing a proper distributed infrastructure. However, sometimes you can just rent that proper infrastructure from a vendor with that problem (e.g. Google or Amazon) and then you don't have to support it.

Things get really interesting (as in bleeding edge research interesting) when none of the above solve your problem. But they also tend to push the time horizon for results way out.

JMHO. Eventually software eats everything. It's a question of time scales. If you need results next week, don't rebuild TensorFlow or Redshift from scratch.