Wide integer multiplication works fine on a GPU: Your number is broken up into 3...

Wide integer multiplication works fine on a GPU:

Your number is broken up into 32-bit coefficients to 2^32 as a compounding power (sum over a_i * 2^32i) -- and then you can do it as tensor(ish) operations, if you fiddle with carry/mod steps in between to keep the coefficients the right size and partial results properly aligned.