I can appreciate the fun to be had with so many core cpu, but did you actually measure the performance? According to Passmark, both single and multithread performance of EPYC is very poor [1][2]. Passmark database is years in building and very informative, but I think for EPYC that is an erroneous result. Could you run Passmark benchmark on your rig to get another data point public?
Will give that a try tonight. Have to warn you that right now 1 of my DIMM was bad, so the system is running with only 15 DIMMS (which is an unsupported config) so the results might be suboptimal until I receive a new DIMM from the seller.
That's great, looking forward to the results. But, there's no rush. If you'd like to measure today anyway, please just make sure the result doesn't get reported to Passmark database so as not to spoil the small dataset with a biased result. Good luck with the replacement, hope it will stay solid from now on.
Ah, that's a shame. I'd love to know how much faster the Epyc is than the E5-2670. Could you try to run sysbench to get some numbers? Here are mine:
# sysbench --test=cpu run --max-requests=20000
Test execution summary:
total time: 25.9014s
total number of events: 20000
total time taken by event execution: 25.8983
per-request statistics:
min: 1.27ms
avg: 1.29ms
max: 3.19ms
approx. 95 percentile: 1.29ms
# sysbench --test=cpu run --max-requests=20000 --num-threads=16
Test execution summary:
total time: 1.6859s
total number of events: 20000
total time taken by event execution: 26.9264
per-request statistics:
min: 1.26ms
avg: 1.35ms
max: 3.59ms
approx. 95 percentile: 1.51ms
[dman@epyc ~]$ sysbench --test=cpu run --max-requests=20000
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --max-requests is deprecated, use --events instead
sysbench 1.0.14 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Prime numbers limit: 10000
Initializing worker threads...
Threads started!
CPU speed:
events per second: 1461.82
General statistics:
total time: 10.0004s
total number of events: 14621
Threads fairness:
events (avg/stddev): 14621.0000/0.00
execution time (avg/stddev): 9.9977/0.00
[dman@epyc ~]$ sysbench --test=cpu run --max-requests=20000 --num-threads=128
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --num-threads is deprecated, use --threads instead
WARNING: --max-requests is deprecated, use --events instead
sysbench 1.0.14 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 128
Initializing random number generator from current time
Prime numbers limit: 10000
Initializing worker threads...
Threads started!
CPU speed:
events per second: 47980.46
General statistics:
total time: 0.4152s
total number of events: 20000
Thanks. I just ran v1.0.14 the same way and got 793 events/s singlethread and 10264 events/s multithread (16) on the E5-2670. So you've got single thread almost 2x faster and multithread almost 5x faster. In singlethread, that's much better than what I expected of low frequency Epycs. You've got a nice snappy machine there:) Interestingly, the performance per thread in multithread is 641 on E5-2670, while only 374 on Epyc. Probably there's some massive thermal throttling going on Epyc. With that fast cores, one should get at least 1491x64=93504 without throttling.
I think the problem size is too small. Dialling up the max requests and setting thread count to 64 yields the per thread in multi thread to 1104. (I am guessing the the scheduler will take a while to bring all 64 threads up, so on small problems one might not see the full benefit of available parallelism, but this is just an armchair hypothesis).
[dman@epyc ~]$ sysbench --test=cpu run --max-requests=2000000 --num-threads=64
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --num-threads is deprecated, use --threads instead
WARNING: --max-requests is deprecated, use --events instead
sysbench 1.0.14 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 64
Initializing random number generator from current time
Prime numbers limit: 10000
Initializing worker threads...
Threads started!
CPU speed:
events per second: 70704.31
General statistics:
total time: 10.0014s
total number of events: 707263
That's interesting, same increase in requests makes little difference on my machine, I get 10432 for multithread (16). I noticed above you have used 64 threads on 64-core system - does that give you better result than using 128 threads? It shouldn't, SMT should give you some increase in speed. At least on my 8-core system, using 16 threads gives 10432, while using 8 threads gives only 7311.