I feel like this is important to note: *My example of real world data is data in...

cntlzw · on July 2, 2018

I am experimenting with Lidar datasets lately. In lidar you work with huge point clouds. Millions to billions. For some algorithms - convex hull - you need to sort the points for example by x-cooridinate. I did some quick tests with quick sort, merge sort, and timsort. In my experiments quick sort was the slowest with 5.5 seconds on the chosen dataset. Mergesort was 5 seconds and timsort was around 4 seconds.

beagle3 · on July 2, 2018

When sorting coordinates, radix sort with an appropriately chose radix (often but not always 256) is usually fastest.

cntlzw · on July 3, 2018

Hi there, thanks. Need to try this

CogitoCogito · on July 2, 2018

Yeah seeing more datasets would definitely be nice. I linked my GitHub repo where I made the changes necessary for this example in the YouTube video. Using the last commit (the only changes from upstream) as a guide, other datasets cold be supported similarly.

That said, I don't think my choice of data is that bad actually. Apparently starting with a sorted dataset, appending some new items, and then sorting the result again is very commonly done by programmers. That is essentially what's represented in my example.

justinclift · on July 2, 2018

Maybe the UK Postcode lookup database?

It's about 650MB (CSV):

https://data.gov.uk/dataset/7ec10db7-c8f4-4a40-8d82-8921935b...

In theory, it's supposed to be updated every now and then, but doesn't seem to have happened since 2015. :(

For a direct SQLite version of it, we have it on DBHub:

https://dbhub.io/justinclift/National%20Statistics%20Postcod...

That site is still in (slow) development though. YMMV. :)