Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
ThriftDB a new service from the Octopart Guys (thriftdb.com)
73 points by staunch on May 24, 2011 | hide | past | favorite | 25 comments


Sounds awfully a lot like the old http://code.google.com/p/thrudb/ ?

"Thrudb is a set of simple services built on top of the Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Its purpose is to offer web developers flexible, fast and easy-to-use services that can enhance or replace traditional data storage and access layers."

No long under development though.


I actually stumbled into thrudb a few months ago after we had already written v1. We talked to some people at Disrupt that are doing similar things with search and Thrift as well.


The thrudb author (https://github.com/tjake) is working on Cassandra now.


It seems like a strange thing to provide as a SAAS. Unless you're hosted in the same datacenter, the web latency would surely make the speed of it irrelevant. They mention they're "working on a way for developers to run ThriftDB locally" which might make it worth looking in to. I could see it being useful for some things, certainly, but it wouldn't provide enough benefit as a SAAS to make calls over the web.


Good point. We decided to make it available as a (free) cloud service just so we could get hacker feedback as quickly as possible. If we had waited until we had a version that was easy to install, it would have taken us a lot more time. If people like it, the plan is to open source it.


I agree getting it ready for "easy" install takes a lot more work than completing the coding. I wonder what feature, or combination of features, is the differentiator here? Maybe it's the speed of search, or the loose document schemas combined with freetext search, the REST API? Some more examples and benchmarks would be interesting, when you find the time :)


The thing that excites us the most is the flexible schema because that can cut down on development time dramatically. The REST API is also optimized for developer happiness.

We're working on some examples and should have them ready soon.

Benchmarking is a good idea but tricky because speed depends on the complexity of the data and the query itself. If you have any recommendations for benchmarks please let me know.

In the future we have plans to add machine learning features to optimize relevancy algorithms automatically but that's still a ways off.


Doesn't this do the same thing as Solandra?


ThriftDB uses Facebook's Thrift serialization internally so the schema is completely flexible.


Sort of, except that this is a hosted web service, which Solandra is not. At least as far as I know.


Déjà vu :-)

I did almost this exactly last fall, except I used JSON and JSON Schema instead instead of thrift. Called it hummingbird db. I submitted to YC but all I got was an email that it wasn't that interesting.


Would love to hear more about hummingbird db. Email me at andres@octopart.com. We chose Thrift because the schema was flexible and that was the biggest problem for us at Octopart.


Did you consider Solr, in "schema free mode" using <dynamicField />?

Edit: I see you are using Solr internally for implementation.


This actually sounds similar to ElasticSearch itself.


it sounds like heaps of people are trying to do very similar things.. just this week I've been hacking on a rails engine/plugin that lets you define your models with mongo_mapper and then request them as json schema. It's not quite complete but json schema gives you interfaces for your restful apis..


1. You say your solution is extremely fast.

2. You don't provide benchmarks.

If your solution is really so fast, then you must be making benchmarks continuously. How else would you know if you are improving and whether you are actually fast or just faster than [a tree | clouds | a ricecorn]. So either you lie about your performance or you choose to purposefully hide your incredibly well performing benchmarks.

Which explanation do you prefer?


I know what you were trying to say, and I agree, but this comment would have been a lot better without its mean and condescending tone.


Sounds a lot like Solr, but can't imagine the search is nearly as powerful.


We actually use Solr on our backend - we love it's open source community and rich feature set.


Sheer curiosity - why did you decide to go with solr instead of elastic search which seems easier to scale out, with much the same feature set?


That is a great question - we actually didn't consider using elastic search, we went with Solr because we use it for Octopart and are experienced with it so it made developing ThriftDB easier. We're evaluating other options now and will have a look.


Did you consider google protocol buffers? If so, why thrift?


We use Python server-side and the Python implementation of Google protocol buffers is extremely slow.


Here are two alternative Python protobuf implementations. They're each about 15x faster than the pure Python implementation from Google.

fast-python-pb - codegen wrapping Google's C++ protobuf implementation. https://github.com/Greplin/fast-python-pb

lwpb - non-codegen using a protobuf implementation in C. (disclaimer: I'm an author) https://github.com/acg/lwpb


Oh cool, so now I can host my app in one data center and have it make DB calls across the open internet to another DB server! But wait, there's more! It's over a stateless protocol: HTTP, with really poor multiplexing/pipelining support.

Latency is a feature, right? Like "slow your roll, cowboy, let's not have a heart attack here".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: