My example of real world data is data in which the first 90% is already sorted and the next 10% is random. Whether this is real enough is questionable, but it does demonstrate how Timsort automatically takes advantage of runs of sorted data.
It would be nice to see a comparison working on truly real-world data. Perhaps from an open dataset.
I am experimenting with Lidar datasets lately. In lidar you work with huge point clouds. Millions to billions. For some algorithms - convex hull - you need to sort the points for example by x-cooridinate. I did some quick tests with quick sort, merge sort, and timsort. In my experiments quick sort was the slowest with 5.5 seconds on the chosen dataset. Mergesort was 5 seconds and timsort was around 4 seconds.
Yeah seeing more datasets would definitely be nice. I linked my GitHub repo where I made the changes necessary for this example in the YouTube video. Using the last commit (the only changes from upstream) as a guide, other datasets cold be supported similarly.
That said, I don't think my choice of data is that bad actually. Apparently starting with a sorted dataset, appending some new items, and then sorting the result again is very commonly done by programmers. That is essentially what's represented in my example.
My example of real world data is data in which the first 90% is already sorted and the next 10% is random. Whether this is real enough is questionable, but it does demonstrate how Timsort automatically takes advantage of runs of sorted data.
It would be nice to see a comparison working on truly real-world data. Perhaps from an open dataset.