Very cool to see another active project that draws upon Prolog for inspiration.
> The benchmarks were run on a Sun SPARCserver 1000 with four 50 MHz TI SuperSPARC processors and 256 megabytes of memory running SunOS 5.3 (Solaris 2.3). Each processor is rated at 60.3 SPECint92, and has a 4-way associative 16 Kb I-cache and a 5-way associative 20 Kb D-cache, backed by 1 Mb of unified secondary cache.
What's the rationale behind such antiquated benchmarking system? Interesting.
Could it be designed to limit variance and the impact of clever hardware and its unpredictable nature? I can imagine benchmarking on a new high-end desktop machine would require a lot of work to make sure your benchmark isn't impacted because you ran it Friday at 11:44 and the cron job that runs at 11:45 messed up a few key caches, made your CPU heat up that extra 1 degree celcius causing some throttling to kick in, causing your prng to be seeded with a value that triggers a few worst-case performance scenarios etc.
There are a lot of factors that can influence a modern system and most of them seem hard to control. For a 8086 you could probably account for all these factors if you put in the hours and run it in a highly controlled environment but I doubt its even possible for a modern CPU, let alone the whole system.
Of course this isn't a problem for most benchmarks as most of them are only useful for demonstrating real world use-cases where the variance is expected and the goal is not to benchmark the performance of a single element of the system but the perceived performance you can expect in a real-world scenario.
Or its simply designed by someone who loves this kind of system and found a good excuse to put one to use.
> The benchmarks were run on a Sun SPARCserver 1000 with four 50 MHz TI SuperSPARC processors and 256 megabytes of memory running SunOS 5.3 (Solaris 2.3). Each processor is rated at 60.3 SPECint92, and has a 4-way associative 16 Kb I-cache and a 5-way associative 20 Kb D-cache, backed by 1 Mb of unified secondary cache.
What's the rationale behind such antiquated benchmarking system? Interesting.