Using the Quick Raise of Matrices to a Power to Write a Fast Interpreter

tel · on Dec 26, 2014

This is an instance of the "Maximally Powerful, Minimally Useful" principle from a few days ago [0]. Because the language being described is sufficiently minimal it has multiple valid, meaningful interpretations including the direct "just execute it" interpretation, a theoretical "compile this to some kind of other language and interpret it there", and, most interestingly, interpret these relations as a linear algebra problem.

It should be quite clear that a more "powerful" language would not admit this last meaning/interpretation/denotation.

[0] http://blog.higher-order.com/blog/2014/12/21/maximally-power...

Animats · on Dec 26, 2014

Arithmetic confined to addition, subtraction, and multiplication by constants is Presburger arithmetic. It's decidable; there are automatic theorem provers for statements in it. The classic Oppen-Nelson prover converts theorems to be proven to matrices, then uses linear programming on rational numbers as a solution mechanism. That may be isomorphic to this approach.

Chinjut · on Dec 26, 2014

The problem of interest to the OP is not determining the truth value of sentences in Presburger arithmetic (that is, sentences using addition, equations/inequalities, Boolean operations, and the quantifiers "There exists an integer such that" and "For all integers, it is the case that"), which, although algorithmically decidable, is nonetheless in general very computationally taxing (the time to compute whether such a sentence is true or false is doubly exponential in the size of the sentence).

Rather, the OP is simply interested in carrying out arithmetic calculations of a sort which are trivially computable (after all, the inputs the OP is dealing with are the very programs which would compute the answer!), but notes that one can in a principled way transform naive specifications of these computations (including loops like "Do this a million times") into faster methods of attaining the same answer (by thinking of such loops as matrix exponentiations and then using quick, rather than naive, exponentiation).

Animats · on Dec 28, 2014

Statements in Presberger arithmetic are equivalent to systems of linear equations/inequalities, which are representable as matrices. That's why both provers and this exponentiation scheme work. If you have two variables multiplied together, you're beyond Presberger arithmetic and linear equations. The simple matrix representation no longer applies.

Someone · on Dec 26, 2014

"The exponentiation algorithm is quite simple:

If the power is zero, we will return the identity matrix. If the power is even (2N), we can recursively compute M^N, and then return the square of the obtained matrix. If the power is odd (2N+1), it’s enough to recursively compute M^2N value, and return the obtained matrix, that has been multiplied by M."

Tidbit: that likely is the best approach in practice, but perhaps surprisingly, it is not optimal in the number of matrix multiplications. See http://en.m.wikipedia.org/wiki/Addition-chain_exponentiation and https://oeis.org/A003313. Wikipedia has a link mentioning addition-subtraction chains, so it may even be beneficial to compute the inverse of the matrix in some cases (M^1023 likely is such a case, if inverting M isn't too hard)

Sniffnoy · on Dec 26, 2014

Well -- the thing about using addition chains is, which addition chain do you use? If we're talking about a general algorithm for any n, rather than for a specific n, then we need to algorithmically generate our addition chain. So going with the shortest addition chain is a bad idea, because it will take you too long to find it. Any method of finding short addition chains based on factoring is similarly a bad idea.

(It's worth noting that while finding short addition chains isn't known to be NP-complete, a slight variant of the problem is.)

I think there are methods of finding short addition chains that reliably do better than the binary method, though; in TAOCP volume 2, "Evaluation of powers" section (which is a very good reference on addition chains in general!), Knuth talks about the power tree method, which from a brief skim seems like it might be better even when you include the time to find the chain. I think there might also be better methods based on continued fractions, going from memory.

bmh100 · on Dec 26, 2014

Would an optimizing compiler be able to understand these chains such that it would be a net gain on execution time?

Sniffnoy · on Dec 26, 2014

I'm not certain I understand what's being asked. I'm going to assume you mean, "Could an optimizing compiler for a language with a built-in exponentiation operator use short addition chains to speed up exponentiation at runtime?" In which case, again, the answer depends on whether the exponent is fixed or is allowed to vary.

For a fixed exponent, certainly; you could just spend additional time in the compiler finding short addition chains for all the exponents that are used. When exponents are allowed to vary, then we're in the case I described in my post above -- maybe using the power-tree method, or certain other methods, would improve on the binary method; but finding the shortest addition chain at runtime would not. (By which I mean, not with existing methods, and it seems unlikely that there's any fast way.)

ivansavz · on Dec 26, 2014

How about computing any Fibonacci number in constant time? This can be done using the eigendecomposition trick for computing powers of matrices: http://minireference.com/static/excerpts/fibonacci_in_consta...

caveat: will fail for large N due to limited precision of floats... works fine on paper though.

[update] caveat 2: assumes a model of computation where ϕ^N can be computed in constant time, which only works on paper.

obastani · on Dec 26, 2014

Regarding the caveat, computing ϕ^N is really equivalent to computing a matrix power M^N. Recall that ϕ=(1+√5) (I dropped the 1/2 for convenience.) Suppose we represent numbers like x+y√5 as [x y]. Then multiplication by ϕ corresponds to multiplication by the matrix

ϕ_mult = [[1 5] [1 1]]

This is because ϕ(x+y√5)=(x+5y)+(x+y)√5, and

[[1 5] [1 1]] [x y] = [x+5y x+y]

Now, to compute ϕ^N, we just have to compute [x y] = ϕ_mult^N[1 0] (since [1 0] is the representation of 1=1+0√5). Then ϕ^N=x+y√5. This is all exact, no floating point arithmetic!

It's actually just an example of an algebraic number field [1] -- the idea is that "algebraic numbers" (i.e., roots of polynomials) can be equivalently thought of as vectors and matrices. The simplest example, which everyone is familiar with, is the complex numbers, where i=√(-1). In this case, [x y] represents x+y*i, and multiplication by i corresponds to the matrix i_mult=[[0 1] [-1 0]]. Then i_mult^2= -[[1 0] [0 1]], in other words, multiplication by -1!

[1] http://en.wikipedia.org/wiki/Algebraic_number_field

ajkjk · on Dec 26, 2014

Raising numbers to large powers is still O(log n), though. Even if it's a single term when written out as an equation.

ivansavz · on Dec 26, 2014

Thx for pointing this out. I updated the comment. It seems O(log n) is as low as it will go, still I find it to be a neat (theoretical) trick.

flebron · on Dec 26, 2014

Where 'large' is about 80.

aruss · on Dec 26, 2014

The author needs to be careful in what model they are using to show asymptotic runtime. Sure, this reduces to O(log n) matrix multiplications, but matrix multiplication is not a constant time operation (and is actually between quadratic/cubic in the dimension of the matrix, assuming arithmetic operations are performed in constant time, which is also not always a great assumption).

okigan · on Dec 26, 2014

Link bait ?

Original article: http://habrahabr.ru/post/148901/ Post date: 2012

More from the same author: http://habrahabr.ru/users/skidanovalex/topics/

nkurz · on Dec 26, 2014

More interesting than link bait. http://kukuruku.com a site that is doing translations from Russian to English of articles originally appearing on http://habrhabr.ru. Most of the articles are solid technical content that would otherwise be inaccessible to non-Russian speakers. Here's some previous discussion on HN: https://news.ycombinator.com/item?id=7770887

On its own, this would be great, but the issue is that these translations are not necessarily authorized in advance, and the attribution was not originally present. But there is now an 'original source' link at the top of each translation, and the translators feel they are adhering to the use policies of Habrhabr.

I was dubious initially, but at this point I'm mostly convinced that their intents are good, and I appreciate having access to these articles that I otherwise would not be able to enjoy. I think they are open to suggestions of how to improve their approach, and hope they eventually find a solution that satisfies all parties.

SkidanovAlex · on Dec 26, 2014

Articles on habrahabr are published under CC-BY-3.0, a very permissive license, that allows articles to be shared and adapted (in particular, translated), so these guys don't really do anything that authors (in this particular case myself) didn't let them do originally by publishing the material on habrahabr.

dllthomas · on Dec 26, 2014

I'm not sure "this is permitted under the license" and "this is clickbait" are mutually exclusive.

jmount · on Dec 26, 2014

More good ideas in this direction: Hashlife http://en.wikipedia.org/wiki/Hashlife

carlob · on Dec 26, 2014

> […] will calculate the 1 000 001st and 1 000 002nd numbers in 4 seconds. Classic version would take much more time, as the program has to operate with multi-thousand-digit numbers.

I'm not sure what classical algorithm of computing Fibonacci numbers OP is mentioning, but using Mathematica on my machine takes about 4 ms to compute the 1,000,001st Fibonacci number exactly…

Maybe by classical he means naive, in that case: doesn't it ever get tired to pick as an example an algorithm that will be optimized away by any half-decent compiler?

ekimekim · on Dec 27, 2014

There's a closed formula for computing the nth fibonacci number - however it remains a good example in numeric contexts, a Hello World equivalent.

TheLoneWolfling · on Dec 26, 2014

(Slightly tangential)

> I have recently read an article about calculating the Nth Fibonacci number in O(log N) arithmetic operations.

Either you assume fixed-precision integers, in which case you just use a lookup table and call it O(1), or you assume arbitrary-precision integers, in which case you cannot have anything less than a O(n) algorithm, as the number of digits of the Nth Fibonacci number is O(n) (as the Fibonacci sequence is exponential).

I see this all over the place, and it's frustrating every time.

zodiac · on Dec 26, 2014

I believe OP is aware of this as he specifically uses the phrase "arithmetic operations" instead of "time", so adding two very large numbers only counts once. It's true that you can't have an O(log N) time algorithm, as arithmetic operations cannot be performed in constant time.

A similar situation arises in, say, mergesort when we say that the algorithm takes O(n log n) comparisons. That's not the same as an O(n log n) time algorithm; for instance if you think of sorting distinct integers, each integer will take at least O(log n) bits to represent, and to compare two integers you could potentially need to read all the bits.

TheLoneWolfling · on Dec 27, 2014

In which case, as I said, just use a lookup table, which is constant-time under those same constraints, and has the bonus of automatically caching frequently used values for you.

And as the Fibonacci sequence is exponential, any lookup table will be relatively small. (The maximum number of elements in your lookup table is linear w.r.t. the number of digits of your (fixed-precision) integers. I mean, even a 128-bit return value only requires 185 elements in the table, for a total of 2960 bytes.)

> A similar situation arises in, say, mergesort

This is why you have to specify exactly what you mean by the big-O notation you specify. Do you mean comparisons? Time? Time assuming everything but the variables specified are constant? Etc.

>It's true that you can't have an O(log N) time algorithm, as arithmetic operations cannot be performed in constant time.

Yep. You can perform some operations in sublinear time w.r.t. the number of bits, however, by exploiting hardware-level parallelization. It would be interesting to see what the best (in terms of big-O) possible implementation of fib(n) would be, given constant-time operations for a fixed-length machine word.

zodiac · on Dec 27, 2014

> This is why you have to specify exactly what you mean by the big-O notation you specify.

He did specify it, it's O(log N) arithmetic operations.

Note: I don't mean "assuming arithmetic operations take constant time, we can compute F(n) in O(log N) time", in which case you can correctly say that, since arithmetic operations don't take constant time in arbitrary precision arithmetic, we must be using fixed precision (or something essentially equivalent), in which case a lookup table takes O(1) time and O(precision) space. I mean "F(n) can be computed with O(log N) arithmetic operations". If I perform my arithmetic operations over arbitrary precision integers, and I implement simple arithmetic operations (so each arithmetic operation takes O((log n)^2) time), then F(n) takes O((log n)^3) time.

grondilu · on Dec 27, 2014

So basically this interpreter checks if a loop contains only linear transformations of the variables in the lexical scope, and if so it builds the corresponding matrix and computes the appropriate power of it.

If so that's fine but it's probably a rare use case, since loops usually have conditional branches, don't they? If they don't the algorithm is poorly written in the first place as it should have been noticed that there was a linear transformation involved.

crb002 · on Dec 26, 2014

I believe this technique is by Valiant?

Leslie G. Valiant: General Context-Free Recognition in Less than Cubic Time. J. Comput. Syst. Sci. 10(2): 308-315 (1975)

mjcohen · on Dec 26, 2014

How about

a+=1; b+=1/a;