Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hmm, Do you have to index intervals directly?

Algorithms for spatially indexing points are tractable - see k-d trees etc. R-trees seem pretty lame to be honest, like an ad-hoc data structure to partly meet a number of requirements.

Of course, some kinds of indexing and searching are inherently harder than others. But I don't see why one couldn't program any needed spatial algorithm directly from just an index of all points in the system.

For example, suppose you have an index of triangles. If want to find all triangle intersecting triangle (x,y,z), you could itterative expand a near-neighbor search from each point. Every point you find close than the given triangle-point's further point is a potential neighbor and could checked reasonably quickly. This gives the set S of intersecting triangles to a given triangle in log^n(size(S)), making the process relatively scalable.

K-d tree seem more sensible: http://en.wikipedia.org/wiki/K-d_tree

And there's more: "Storing objects in a space-partitioning data structure (kd-tree or BSP for example) makes it easy and fast to perform certain kinds of geometry queries – for example in determining whether a ray intersects an object, space partitioning can reduce the number of intersection test to just a few per primary ray, yielding a logarithmic time complexity with respect to the number of polygons."

Lends credence to my argument that a polygon-polygon intersection should be obtainable in log^n or maybe even log time.

Space partitioning in general does cool stuff: http://en.wikipedia.org/wiki/Space_partitioning



The problem is less with the mathematics, but efficient storage and retrieval of a large number of points (i.e. index won't fit in memory) from disk. Doing a search across a tree rendered to disk is really slow.

Even with mathematically sound storage trees as indexes, locations which are close in space could easily be split between two arms of the tree, making proximity searches quite expensive - you would have to traverse the entire tree to ensure that you do not miss data separated by a poor splitting plane.

There are a number of theoretically efficient solutions to this, but they are all become remarkably inefficient when you add in a hard drive.


I think if you're going to be doing sufficiently 'biggish data' with those kind of retrieval charcteristics, you've got to just give up on disk and use flash only.


Flash solves the "added overhead for random access" portion of the problem, but the overhead for any access at all is still remarkably high, particularly when compared to main memory.

Perhaps when we get more "flash connected to the main bus" options (there are a few commercial options which sit in PCI Express, AGP, and even DRAM slots), we can begin discounting the overhead more and more, but we're not there quite yet.


Well, supposedly, cache-oblivious algorithms exist to deal with this kind of thing - by grouping related data together regardless of exactly what cache-scale you are at.


My experience playing around with KD trees is that they are super effective for multidimensional indexing (including spatial) and range search so long as the data is relatively static. The difficulty comes when trying to update and re-balance them dynamically, which is where other structures perform better.


Oh Dear,

I was remembering my previous research wrong.

It's quad-trees and related Z-order based curves that give log(n) search and inserts.

With those, "everything" become log^n.

- Given that, my previous argument concerning log time for triangle/polygon search should stand.

http://en.wikipedia.org/wiki/Quadtree http://en.wikipedia.org/wiki/Z-order_curve




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: