Hmm, Do you have to index intervals directly? Algorithms for spatially indexing ...

falcolas · on March 15, 2015

The problem is less with the mathematics, but efficient storage and retrieval of a large number of points (i.e. index won't fit in memory) from disk. Doing a search across a tree rendered to disk is really slow.

Even with mathematically sound storage trees as indexes, locations which are close in space could easily be split between two arms of the tree, making proximity searches quite expensive - you would have to traverse the entire tree to ensure that you do not miss data separated by a poor splitting plane.

There are a number of theoretically efficient solutions to this, but they are all become remarkably inefficient when you add in a hard drive.

gobengo · on March 16, 2015

I think if you're going to be doing sufficiently 'biggish data' with those kind of retrieval charcteristics, you've got to just give up on disk and use flash only.

falcolas · on March 16, 2015

Flash solves the "added overhead for random access" portion of the problem, but the overhead for any access at all is still remarkably high, particularly when compared to main memory.

Perhaps when we get more "flash connected to the main bus" options (there are a few commercial options which sit in PCI Express, AGP, and even DRAM slots), we can begin discounting the overhead more and more, but we're not there quite yet.

joe_the_user · on March 16, 2015

Well, supposedly, cache-oblivious algorithms exist to deal with this kind of thing - by grouping related data together regardless of exactly what cache-scale you are at.

stream_fusion · on March 15, 2015

My experience playing around with KD trees is that they are super effective for multidimensional indexing (including spatial) and range search so long as the data is relatively static. The difficulty comes when trying to update and re-balance them dynamically, which is where other structures perform better.

joe_the_user · on March 16, 2015

Oh Dear,

I was remembering my previous research wrong.

It's quad-trees and related Z-order based curves that give log(n) search and inserts.

With those, "everything" become log^n.

- Given that, my previous argument concerning log time for triangle/polygon search should stand.

http://en.wikipedia.org/wiki/Quadtree http://en.wikipedia.org/wiki/Z-order_curve