ThriftDB a new service from the Octopart Guys

cgbystrom · on May 24, 2011

Sounds awfully a lot like the old http://code.google.com/p/thrudb/ ?

"Thrudb is a set of simple services built on top of the Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Its purpose is to offer web developers flexible, fast and easy-to-use services that can enhance or replace traditional data storage and access layers."

No long under development though.

andres · on May 25, 2011

I actually stumbled into thrudb a few months ago after we had already written v1. We talked to some people at Disrupt that are doing similar things with search and Thrift as well.

jbellis · on May 25, 2011

The thrudb author (https://github.com/tjake) is working on Cassandra now.

daviddoran · on May 24, 2011

It seems like a strange thing to provide as a SAAS. Unless you're hosted in the same datacenter, the web latency would surely make the speed of it irrelevant. They mention they're "working on a way for developers to run ThriftDB locally" which might make it worth looking in to. I could see it being useful for some things, certainly, but it wouldn't provide enough benefit as a SAAS to make calls over the web.

andres · on May 24, 2011

Good point. We decided to make it available as a (free) cloud service just so we could get hacker feedback as quickly as possible. If we had waited until we had a version that was easy to install, it would have taken us a lot more time. If people like it, the plan is to open source it.

daviddoran · on May 25, 2011

I agree getting it ready for "easy" install takes a lot more work than completing the coding. I wonder what feature, or combination of features, is the differentiator here? Maybe it's the speed of search, or the loose document schemas combined with freetext search, the REST API? Some more examples and benchmarks would be interesting, when you find the time :)

andres · on May 25, 2011

The thing that excites us the most is the flexible schema because that can cut down on development time dramatically. The REST API is also optimized for developer happiness.

We're working on some examples and should have them ready soon.

Benchmarking is a good idea but tricky because speed depends on the complexity of the data and the query itself. If you have any recommendations for benchmarks please let me know.

In the future we have plans to add machine learning features to optimize relevancy algorithms automatically but that's still a ways off.

jasonkolb · on May 24, 2011

Doesn't this do the same thing as Solandra?

andres · on May 24, 2011

ThriftDB uses Facebook's Thrift serialization internally so the schema is completely flexible.

vegashacker · on May 24, 2011

Sort of, except that this is a hosted web service, which Solandra is not. At least as far as I know.

zbowling · on May 24, 2011

Déjà vu :-)

I did almost this exactly last fall, except I used JSON and JSON Schema instead instead of thrift. Called it hummingbird db. I submitted to YC but all I got was an email that it wasn't that interesting.

andres · on May 24, 2011

Would love to hear more about hummingbird db. Email me at andres@octopart.com. We chose Thrift because the schema was flexible and that was the biggest problem for us at Octopart.

nl · on May 25, 2011

Did you consider Solr, in "schema free mode" using <dynamicField />?

Edit: I see you are using Solr internally for implementation.

LiveTheDream · on May 25, 2011

This actually sounds similar to ElasticSearch itself.

anko · on May 25, 2011

it sounds like heaps of people are trying to do very similar things.. just this week I've been hacking on a rails engine/plugin that lets you define your models with mongo_mapper and then request them as json schema. It's not quite complete but json schema gives you interfaces for your restful apis..

Meai · on May 24, 2011

1. You say your solution is extremely fast.

2. You don't provide benchmarks.

If your solution is really so fast, then you must be making benchmarks continuously. How else would you know if you are improving and whether you are actually fast or just faster than [a tree | clouds | a ricecorn]. So either you lie about your performance or you choose to purposefully hide your incredibly well performing benchmarks.

Which explanation do you prefer?

Yzupnick · on May 24, 2011

I know what you were trying to say, and I agree, but this comment would have been a lot better without its mean and condescending tone.

blago · on May 24, 2011

Sounds a lot like Solr, but can't imagine the search is nearly as powerful.

smock · on May 24, 2011

We actually use Solr on our backend - we love it's open source community and rich feature set.

zbailey · on May 25, 2011

Sheer curiosity - why did you decide to go with solr instead of elastic search which seems easier to scale out, with much the same feature set?

smock · on May 25, 2011

That is a great question - we actually didn't consider using elastic search, we went with Solr because we use it for Octopart and are experienced with it so it made developing ThriftDB easier. We're evaluating other options now and will have a look.

fleaflicker · on May 25, 2011

Did you consider google protocol buffers? If so, why thrift?

andres · on May 25, 2011

We use Python server-side and the Python implementation of Google protocol buffers is extremely slow.

sigil · on June 4, 2011

Here are two alternative Python protobuf implementations. They're each about 15x faster than the pure Python implementation from Google.

fast-python-pb - codegen wrapping Google's C++ protobuf implementation. https://github.com/Greplin/fast-python-pb

lwpb - non-codegen using a protobuf implementation in C. (disclaimer: I'm an author) https://github.com/acg/lwpb

_4vyi · on May 25, 2011

Oh cool, so now I can host my app in one data center and have it make DB calls across the open internet to another DB server! But wait, there's more! It's over a stateless protocol: HTTP, with really poor multiplexing/pipelining support.

Latency is a feature, right? Like "slow your roll, cowboy, let's not have a heart attack here".