The FBHHRBNRSSSHK-Algorithm for Multiplication is still not the end

telotortium · on Oct 12, 2022

Abstract: "In response to a recent Nature article which announced an algorithm for multiplying 5×5-matrices over ℤ2 with only 96 multiplications, two fewer than the previous record, we present an algorithm that does the job with only 95 multiplications."

The "recent Nature article" is, of course, Deepmind's reinforcement learning-guided search for better matrix multiplication algorithms: https://news.ycombinator.com/item?id=33096580.

They improved upon Deepmind's algorithms by applying a method of their own, which is not described in this paper, but which they will publish in an upcoming paper: "Manuel Kauers and Jakob Moosbauer. Flip graphs for matrix multiplication. in preparation."

ckemere · on Oct 12, 2022

Deep-learning-hype-hater-quick-take: Based on that title (and a glance at the paper), my guess is that rather than brute force and megawatts of GPU power, they found a first-principles solution.

sdenton4 · on Oct 13, 2022

Their result is "We found a slightly better method after someone told us where to look."

It's very common in mathematics for simpler proofs and extensions of results to follow an initial ground-breaking paper. People looking at this preprint as a refutation of the ML techniques have it exactly backwards.

jltsiren · on Oct 13, 2022

This seems to be a typical case of the following scenario:

You have been working on a topic for a while. Then, one day, you learn that someone has managed to get a paper on the topic into Nature/Science/whatever. And not because their results are that impressive but because they are from a famous institute and are using fashionable methods. Of course, the first thing you do is checking if you can improve the results with the tools you already have. If you can, you post the results online and start working on a paper.

That happens all the time.

inasio · on Oct 12, 2022

On that Google Nature paper, I'm not sure if the 5x5 matrix improvement is the more significant result, or that they figured out a way to optimize directly to SIMD/AVX instructions, resulting in faster execution even if not necessarily fewer operations

ncmncm · on Oct 12, 2022

The result is not interesting for ordinary numbers, where the classical ("slower") method is faster. When each element of the 5x5 matrix is itself a matrix, it could be important.

So AVX ops for the method are not interesting.

bodhiandphysics · on Oct 13, 2022

No... using divide and conquer matrix multiplication with this gives n^2.83. Strassens algorithm is n^2.79. This is good fir 5 by 5 matrices... which are not terribly interesting.

ncmncm · on Oct 13, 2022

It uses many more additions. When the elements are themselves matrices, additions are a lot cheaper than multiplications, so that is OK. But with ordinary numbers, additions cost about the same as multiplications, so trading off a few multiplications for a lot more additions loses.

vlovich123 · on Oct 13, 2022

I thought it was when you were multiplying two matrices where the elements are 5x5 rather than a 5x5 matrix containing embedded matrices of arbitrary length, no?

londons_explore · on Oct 13, 2022

The technique works for either

ncmncm · on Oct 13, 2022

But usefully for only one.

ForOldHack · on Oct 13, 2022

A machine or any algorithm, by definition cannot be first principals. This term was coined with Richard Feynman, and ever since then, in all instances, it has been mis-used.

"No official code found."

There are known faster methods, that as yet are unproven, that will save giga-watts of power.

naillo · on Oct 12, 2022

Hope they get a nature article as well.

TremendousJudge · on Oct 12, 2022

So it's a teaser paper?

3j0hn · on Oct 12, 2022

I wouldn't exactly call it a teaser. The formulas they published stand on their own, and putting them out there in the wake of all the press of the AlphaTensor paper, is just a good idea.

Of course, these formulas are not super interesting on their own, and so the forthcoming paper on HOW they developed these formulas is in some sense the "real" paper.

mabbo · on Oct 12, 2022

I think the point is to publish before anyone else does. If they wait until they're full paper is ready, they might be scooped.

ReaLNero · on Oct 12, 2022

I'm not sure how to feel about this. If the purpose of science research is to build the understanding we have, teaser articles seem more like a way to put a name on a creation without actually contributing to our understanding of something.

I guess this is an artifact of how cutthroat academia is and how vital it is to have credit for being first.

TheRealPomax · on Oct 12, 2022

> If the purpose of science research is to build the understanding we have.

It's not. It never was. It's always been "to build the understanding that I or this small group I am part of has, that gives us an edge for a while until we share it with the world because it looks like someone else might otherwise pretend to have discovered it".

The idea of a noble science, diligently working towards the betterment of mankind, has always been a romantic fantasy at best.

golem14 · on Oct 12, 2022

"The Double Helix" by James Watson is still a surprisingly accessible read that's very relevant here.

klyrs · on Oct 13, 2022

I think I'm alright with it in this case. Here, they've given the result and a full proof that a keen undergrad could follow. The future reference is about the method they used to find it. People can hit the ground running with their accelerated matrix multiplication today, should they want to. But if they want to apply that method to find 17x17 multiplication formulae, they'll have to wait.

Beldin · on Oct 12, 2022

> I guess this is an artifact of how cutthroat academia is and how vital it is to have credit for being first.

In fairness, academia has improved much in this regards since the days of Newton. That man was such an arse... even the Wikipedia article about his feud with Leibniz glosses over... everything.

Basically, not only was he a donkey about things, he used all sorts of underhanded tactics to win and reveled in devastating his rival's good standing.

sweettea · on Oct 12, 2022

But it is a long principle of academia to avoid being scooped by putting out priority-proving work. Consider US patents; or, further back, I read here yesterday that Galileo published anagrams of his results and Kepler took great delight in reverse-engineering them, accidentally finding other true facts other than Galileo's intended discovery.

eynsham · on Oct 12, 2022

So long as it doesn't slow or otherwise impede the dissemination of the full paper I don't mind personally.

zitterbewegung · on Oct 12, 2022

If they were able to complete it in a week then it is a big deal… especially about wording if that they would be scooped.

mattkrause · on Oct 13, 2022

Three days! The Deep Mind paper came out on the 5th of October; the first version of this on the 8th.

Yajirobe · on Oct 12, 2022

Are we calling dibs on research now? Is this a kindergarten?

mattkrause · on Oct 13, 2022

Now?!

Sorry, but someone called dibs on the idea of “priority” long, long ago. The oldest dispute on this list is in the 1500s: https://en.wikipedia.org/wiki/List_of_scientific_priority_di...

girvo · on Oct 12, 2022

Science/academia has been for as long as I can remember.

sidlls · on Oct 12, 2022

Read some of the history around natural philosophy some few hundred years ago. It’s a real hoot, how petty and ego-driven some of these people were. Leibniz’s attempts at taking credit for Newton’s hard work come to mind. (Tongue somewhat planted in cheek here.)

Research has always been about getting credit for first discovering something, to some extent.

PaulHoule · on Oct 12, 2022

But now that people know the solution exists, other people can find a justification for it.

alexmolas · on Oct 12, 2022

I love how advances in modern approaches (deep learning) push for advances in classical approaches. Something similar happened in physics, where advances in quantum algorithms made researchers to find better classical algorithms.

Science is not a competition!

robbomacrae · on Oct 12, 2022

My admittedly less charitable take is that they may have been sitting on this 1 operation optimization for a while, noticed it works on the new method, and only decided to go public now they have seen the positive reception and potential of such collaborations.

kcexn · on Oct 12, 2022

More likely, I think it's that the first principles approach they have used to find this optimisation has applications in domains far outside of optimising matrix multiplications.

Only because the Google paper crossed their desk did they even think to try it on matrix multiplications.

And then more or less by luck, it happened to turn out to be slightly better. If it had produced exactly the machine learning algorithm, it would have been less interesting.

The speed of the response makes me feel that the authors are extremely good at their domain, and that finding this algorithm was a reasonably trivial problem for them. The slowness of the follow up paper, I believe is that the authors understand that their mathematical domain is wildly far outside of what most people can easily understand, so they're going to take their time to write it up well, instead of drop a whole bunch of maths definitions that nobody else in the world understands.

mattkrause · on Oct 13, 2022

The authors were already working on other small-matrix optimizations. They have a 2020 preprint about choosing good pivots for up to 8x8 matrices.

https://arxiv.org/abs/2006.01623

woopwoop · on Oct 13, 2022

That seems fine to me. What's so sinister about that?

robbomacrae · on Oct 14, 2022

Well there could be an argument that computational saving info should be shared to reduce energy usage but I also think it's fine to sit on it a while.

ummonk · on Oct 13, 2022

"Science is not a competition!" - it kind of is though

ajkjk · on Oct 12, 2022

I'm really interested to know when we will be able to prove that we have the fastest algorithm for some matrix multiplication problem. Is there a theoretical lower bound? (well in this case, for 5x5, there's gonna be an actual lower bound. but for asymptotic cases..?) Is figuring out the theoretical lower bound some kind of undecidable/NP problem? what do we know about it?

klyrs · on Oct 12, 2022

A lower bound of O(n^2 log n) has been known for quite some time. Good luck going any tighter than that... https://eccc.weizmann.ac.il/report/2002/012/download/ [pdf]

henrydark · on Oct 12, 2022

that's for real and complex fields, and it's not exactly that. I don't think there's anything better than O(n^2) with no conditions

temporalparts · on Oct 13, 2022

simple proof that you can't get better than O(n^2) is that you need to produce an n x n matrix with O(n^2) distinct values, so you need at least O(n^2) just to print the result.

LinkLink · on Oct 13, 2022

Not necessarily, assuming two nxn matrices are functionally defined as all 0's printing them takes O(n)

dmurray · on Oct 13, 2022

The problem is defined to be over all (real-valued?) nxn matrices. It's fine if your method uses a representation other than the typical row- normal or column-normal form, but its performance is going to be measured over the worst case.

Someone · on Oct 13, 2022

No, it doesn’t. You still need to produce n² zeroes for the result. If you multiply two n × n matrices, the result isn’t a vector; it’s a n × n matrix.

rrobukef · on Oct 13, 2022

Any reasonable description of the matrix will do. The string "An n by n matrix of all zeroes." can be a good output if correctly defined.

Someone · on Oct 13, 2022

By that logic, every algorithm is O(1). You first limit it to a single input, then you “correctly define” the output for that input using some fixed length string, and you’re golden.

rrobukef · on Oct 13, 2022

Thus the importance of a good definition 'reasonable'. I can't remember the one in that one course 10 years ago.

However, yes, there are big differences based on the encoding. Nobody would accept using unary input encoding to artificially inflate the size of the input. It's also why the time-compelixity of prime-testing is sometimes confusing.

klyrs · on Oct 12, 2022

ah, excellent point

3j0hn · on Oct 12, 2022

This is a little bit apples and oranges, however. An asymptotic bound on the complexity doesn't say a lot about the number of multiplications needed for a fixed value of n=5 say. A lot of those complexity proofs say "for large enough n".

klyrs · on Oct 12, 2022

Not so! The 5x5 multiplication can be applied recursively to NxN matrices -- at the top level the "5x5 multiplication" carves the matrix into a 5x5 grid of "blocks" each of size approximately (N/5 x N/5). It's been a while since I've done this sort of complexity analysis, but I found some nice slides here explaining how to use the so-called Master Theorem for recursive algorithms: https://www.inf.ed.ac.uk/teaching/courses/ads/Lects/lecture3... [pdf]

3j0hn · on Oct 13, 2022

Oh yeah, great point. Like plugging numbers into the master theorem, if 5x5 could be done in 25 multiplications, say, it would imply multiplication in O(n^2), and even 45 multiplications would give a new asymptotically fast multiplication. I would have to dig out a textbook to figure out what an Omega(n^2 log n) lowerbound would imply for n=5.

edit of course wikipedia links to a explicit lowerbound which is kind of useful for small cases. https://www.sciencedirect.com/science/article/pii/S0885064X0... has it as 2*n*m+2*n-m-2 which is 53 for m=n=5, which is not super tight, but tight enough to say that no 5x5 formula is ever going to lead to an asymptotically faster MM algorithm.

ithinkso · on Oct 12, 2022

I think there is some educated-believe that it can be done in n^(2+epsilon) but not in n^2. No proof yet ofcourse

genpfault · on Oct 12, 2022

https://en.wikipedia.org/wiki/Computational_complexity_of_ma...

imbusy111 · on Oct 12, 2022

Obviously a lower bound is O(N*M), but it's not very useful. But I'm not aware of any tighter lower bound.

jeffbee · on Oct 12, 2022

This is a bit like the way that Bannister broke a decade-old record with his 4-minute mile, then someone beat his record 6 weeks later.

harveywi · on Oct 12, 2022

Even more impressive considering that the universe is expanding. So everyone today has to run farther in even less time.

tiborsaas · on Oct 12, 2022

Matter is not expanding with space, the forces between atoms are much stronger, so planets stay as-is over time. Instead think of space between galaxies expanding.

yarg · on Oct 12, 2022

I think he was joking.

thehappypm · on Oct 13, 2022

How can both of those be true?

pixl97 · on Oct 13, 2022

Gravitational binding at a local level is stronger than the metric expansion of space. In the vast voids between galaxies it has a pronounced effect.

MauranKilom · on Oct 13, 2022

If you inflate a balloon with a water droplet on it, it's possible that the balloon becomes 20x its initial size, but the water droplet stays "compact" (and does not increase) due to its surface tension.

mrpopo · on Oct 13, 2022

You can imagine that the space between matter particles is expanding, but since the binding forces between them are much stronger, they come back to the same relative positions. Only at scales where gravitation/electromagnetic forces have little to no influence you will be able to see things drifting apart

ncmncm · on Oct 12, 2022

I can't even think of the space between galaxies as they are now.

tomrod · on Oct 12, 2022

I don't know that the stretch affects us that way, or in that short of time, in any appreciable manner.

alexmolas · on Oct 12, 2022

Besides from the reduction from 96 to 95 multiplications it would be interesting to know how have they derived these results.

I think it's more interesting and has more impact to know the methodology they followed than just to know the algorithm.

metahost · on Oct 12, 2022

Aside: what's on with the name of the algorithm? -- does it stand for something?

robinhouston · on Oct 12, 2022

It's the initials of the authors of the Nature paper

metahost · on Oct 12, 2022

Ah gotcha, thanks!

briga · on Oct 13, 2022

Pretty terrible name for an algorithm to be honest. Besides being unpronounceable, if the purpose is to acknowledge the discoverers of the algorithm it doesn’t even do a good job of that (who recognizes initials anyway?)

Traubenfuchs · on Oct 12, 2022

How would one go about finding new ways to multiplicate matrices?

Trial and error? Can the process be automated? Genius gets hit on the head by an apple moment?

klyrs · on Oct 12, 2022

For small n, you can reasonably brute-force all (n x n) multiplication formulae in characteristic 2 because there's a naive bound on the number of element multiplications and the coefficients are binary. As n gets large (say, n>3?), the cost of brute-force becomes prohibitive.

Beyond brute-force, I imagine that this problem could be phrased as a max-sat problem, mixed-integer linear program or similar. There are generic solvers that can get solutions for problems of moderate size. But unless it's a fairly rote translation to the solver's native representation, it's often better to write a custom solver and port heuristics into the notation native to the problem. As far as I understand it, that's the approach that the Fawzi et al and TFA took.

js8 · on Oct 12, 2022

I was wondering, why use Deepmind when you can just use SAT solver? I am not saying SAT solvers cannot be improved with AI, maybe they can.

fcholf · on Oct 12, 2022

Actually, this has been done: https://arxiv.org/abs/1903.11391

klyrs · on Oct 12, 2022

Note my uncertainty about the suitability of a max-sat formulation. But also note who did the Fawzi et al result: the DeepMind team at Google. I think that dogfooding their own stuff is an understandable default.

whymauri · on Oct 12, 2022

Virginia Williams teaches an excellent course:

https://people.csail.mit.edu/virgi/6.890/

Lecture notes are open.

PartiallyTyped · on Oct 12, 2022

Thank you for sharing this! Do you know of similar gems?

Jabbles · on Oct 12, 2022

See https://news.ycombinator.com/item?id=33096580

nextaccountic · on Oct 13, 2022

Where's the proof that those algorithms works for all 4x4 and 5x5 matrices?

An induction proof or anything else would suffice

Sharlin · on Oct 12, 2022

The cryptic "5x52" in the title should be $\mathbb{Z}_2^{5\times5}$. Not sure how best to phrase that in plain text, but at least it shouldn't be "5×52".

raverbashing · on Oct 12, 2022

Z_2^(5x5) might be a better description (or, better, 5x5 matrices of binaries)

Sharlin · on Oct 12, 2022

Or "5x5 bit matrices".

mminer237 · on Oct 12, 2022

Something like ℤ₂⁵×⁵?

jwilk · on Oct 12, 2022

The multipication sign should be in superscript, but unforunately there's no suitable Unicode character… unless we abuse IPA:

ℤ₂⁵ˣ⁵

cassonmars · on Oct 12, 2022

Might be abuse of IPA, but 100% stealing this trick for platforms that don’t let me use LaTeX

tomrod · on Oct 12, 2022

How?!?

mzs · on Oct 13, 2022

%u2124%u2082%u2075%u02E3%u2075

https://en.wikipedia.org/wiki/Spacing_Modifier_Letters

"Modifier Letter Small X"

dang · on Oct 12, 2022

Let's try to avoid this by cutting that bit.

Someone will soon point out why that doesn't work, but in the meantime we may gain a few minutes...