Found a 2014 thread where 10-18% of the test harness time was spent booting the interpreter for the 13,000 required instances. The deeper thread was showing some 500-700 seconds of test time was just the interpreter overhead. The original point of the article was how much worse the overhead was in Python3 vs Python2.
Did they have like 100k tests?