Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Cassandra Internals – Reading (mikeperham.com)
35 points by r11t on March 18, 2010 | hide | past | favorite | 6 comments


Those interested in the topic may also want to read this:

"Why we’re using HBase (at Adobe)": http://hstack.org/why-were-using-hbase-part-1/

It is a fine "war-story" of picking new technology and making it work without losing data.

(It was submitted yesterday by the author here: http://news.ycombinator.com/item?id=1196382, but got killed with 5 points, which baffles me. I found it when puzzling out why my submission today was instantly killed, with a different item id ...)

[P.S. minor bug report: my 'dead' item has a working link to the article, which it perhaps shouldn't. http://news.ycombinator.com/item?id=1200833]


The reason uncached reads are slower in Cassandra is not because the sstable is inherently io-intensive (it's actually better than b-tree based storage on a 1:1 basis) but because in the average case you'll have to merge row fragments from 2-4 sstables to complete the request, since sstables are not update-in-place.


Little misinformative imo. While Cassandra has eventual consistency, reads are not slow necessarily. With the right Cache settings tuned correctly (KeysCached/RowsCached) and available memory, Cassandra actually performs quite well. Cassandra is virtually worthless without those cache features kind of like MySQL is without indexes. They are slower than writes but I think it would've been more proper to talk about how the cache works and more interesting.

Like any database, MySQL/Postgres/etc, it's a dark art in terms of understanding how to make it work.


Right. Digg dropped memcached entirely from their architecture when we added RowsCached to Cassandra.



Cassandra isn't easy to learn like, say, couchdb. But Couch uses JSON (An awesome choice, BTW), and Cassandra uses Thrift.

Cassandra is kinda difficult to pick up because there is no SQL equivalent, there are no relationships, joins or "where"s.

So, basically, it's an engine without user friendly controls. But - it's probably the most awesomely powerful storage engine yet available in the public domain.

Imagine if Google released a server image of one of their storage nodes... ostensibly, that's what Facebook did with Cassandra.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: