Thanks. I just ran v1.0.14 the same way and got 793 events/s singlethread and 10264 events/s multithread (16) on the E5-2670. So you've got single thread almost 2x faster and multithread almost 5x faster. In singlethread, that's much better than what I expected of low frequency Epycs. You've got a nice snappy machine there:) Interestingly, the performance per thread in multithread is 641 on E5-2670, while only 374 on Epyc. Probably there's some massive thermal throttling going on Epyc. With that fast cores, one should get at least 1491x64=93504 without throttling.
I think the problem size is too small. Dialling up the max requests and setting thread count to 64 yields the per thread in multi thread to 1104. (I am guessing the the scheduler will take a while to bring all 64 threads up, so on small problems one might not see the full benefit of available parallelism, but this is just an armchair hypothesis).
[dman@epyc ~]$ sysbench --test=cpu run --max-requests=2000000 --num-threads=64
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --num-threads is deprecated, use --threads instead
WARNING: --max-requests is deprecated, use --events instead
sysbench 1.0.14 (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 64
Initializing random number generator from current time
Prime numbers limit: 10000
Initializing worker threads...
Threads started!
CPU speed:
events per second: 70704.31
General statistics:
total time: 10.0014s
total number of events: 707263
That's interesting, same increase in requests makes little difference on my machine, I get 10432 for multithread (16). I noticed above you have used 64 threads on 64-core system - does that give you better result than using 128 threads? It shouldn't, SMT should give you some increase in speed. At least on my 8-core system, using 16 threads gives 10432, while using 8 threads gives only 7311.