Overflow checks can be very expensive without hardware support. Even on platforms with lightweight support (e.g. x86 'INTO'), you're replacing one of the fastest instructions out there -- think of how many execution units can handle a basic add -- with a sequence of two dependent instructions.
A vast majority of the cost is missed optimization due to having to compute partial states in connection to overflow errors. The checks themselves are trivially predicted, and that's when the compiler can't optimize them out.