Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure why the author is dismissing space-filling curves so quickly, just based on the age of the idea. I've worked with them extensively, in the context of spatial indexing in a database system, in both research and product settings. They address a number of problems described for interval data types. The techniques I've been working with "decompose" a spatial object into a number of regions by recursively partitioning the space, ending up with a small number of "z-values" for each. A z-value is represented by an integer, which is then used as a key in a conventional index, e.g. a sorted array, balanced binary tree, B-tree, skiplist, etc.

Getting a uniform distribution for sharding is easy in one dimension. I'm not sure what kind of scaling the author has in mind, but if you put the z-values in a B-tree, it scales exactly like a B-tree, i.e., extremely well. I agree there is no exploitable ordering of the spatial objects, but the trick is to order the z-values. If a given spatial object gives rise to 4 z-values, then they -- and the spatial object -- appear in 4 places in the ordering. That's fine. Run your search (in one dimension), get qualifying records, and then get rid of duplicates later.



The point was really that space-filling curve indexes do not solve the underlying issues with polygon indexing, their oldness was not a factor. That type of indexing is so old that the patents (yes, they were patented by the big database companies) have long since expired. And yet they are rarely used commercially. There is good reason for that. There is a resurgence in their usage by people new to spatial indexing but it is pretty obvious that few implementors are aware of the extent of the computer science, which goes back into the 1960s.

The assumption of size was "large-scale", which in this day and age usually means massively distributed. A B-tree is not a solution to that problem domain.


> The assumption of size was "large-scale", which in this day and age usually means massively distributed. A B-tree is not a solution to that problem domain.

Another thought on this one: Space-filling curves decomposes the problem nicely: Scale out your favorite, ordered key/value stored, and then layer spatial indexing on top. No need to develop a new data structure that both scales out and handles spatial data.


This is my thought as well. It seems like a perfectly feasible solution to me, but I might be speaking out of the naivete mentioned in the article.


I'm still not sure what problems of polygon indexing are not solved by space-filling curves.

About commercial usage:

- Oracle uses R-trees now, I think. Before that, they had what I thought was a bad implementation of something sort of grid-like. Some notes I wrote from a few years ago on their "hh-encoded" index:

    Oracle 9 Spatial offered both R-tree and quadtree indexes 
    (http://download-uk.oracle.com/docs/cd/B10501_01/appdev.920/a96630/sdo_intro.htm#i877656). 
    The generation of index terms for a spatial object is called "tesselation". They have 
    two approaches to tesselation, 1) fixed-size tiles, in which the space is partitioned to 
    a fixed resolution, and covering cells give rise to index terms; 2) variable-size tiles, 
    in which spatial decomposition stops when a configurable number of tiles are generated for 
    the spatial object being indexed.

    Now here's where it gets strange. An Oracle quadtree index is either pure-fixed, or 
    it is hybrid. Hybrid is a fixed index AND a variable index. The docs recommend not 
    using hybrid. I could not find an explanation of why they even offer a hybrid option
    (as opposed to pure-variable).

    This Oracle 9 doc is from 2002. I also found a paper doing a performance comparison 
    of R-tree and fixed from around that time 
    (http://www.dpi.inpe.br/cursos/ser303/oracle_r_tree.pdf). This paper concludes 
    that R-trees are usually best, although they identify situations in which quadtrees win.

    Oracle 11 appears to have dropped quadtrees altogether
    (http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28400/sdo_intro.htm#i877656).

    The pure variable-sized tile approach option was never offered in their product, 
    and not examined in the 2002 performance comparison. As that paper points out, 
    tuning fixed-size is tricky. If the tile size is too small, you get a large number 
    of index entries per spatial object. They take space, io bandwidth to read, 
    cpu time to process, etc. If the tile size is too large, you run the risk of 
    poor accuracy, retrieving more objects than necessary, again generating more io and 
    cpu. And if your data set has objects of various sizes, there is no single good size.
- Microsoft is less clear to me. From some of the tuning options on their spatial indexes, it sounds to me like a fixed-size tile approach, with tunable resolution. But I could be wrong, and I haven't looked into it in years.

- PostGIS uses R-trees. The problem with this approach is that it's a separate implementation from their main B-tree, so it always lags behind. With space-filling curves, you just use your B-tree and improvements to it then apply to your spatial indexes.

- MySQL also uses R-trees, and only in conjunction with MyISAM. So that's pretty bad.

- I know less about distributed spatial indexes, but I would think that space-filling curves are more amenable to distributed structures because they don't rely on a tree to organize the space. You can partition the 1-d space of z-values to load balance the shards.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: