Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The FBHHRBNRSSSHK-Algorithm for Multiplication is still not the end (arxiv.org)
168 points by zitterbewegung on Oct 12, 2022 | hide | past | favorite | 87 comments


Abstract: "In response to a recent Nature article which announced an algorithm for multiplying 5×5-matrices over ℤ2 with only 96 multiplications, two fewer than the previous record, we present an algorithm that does the job with only 95 multiplications."

The "recent Nature article" is, of course, Deepmind's reinforcement learning-guided search for better matrix multiplication algorithms: https://news.ycombinator.com/item?id=33096580.

They improved upon Deepmind's algorithms by applying a method of their own, which is not described in this paper, but which they will publish in an upcoming paper: "Manuel Kauers and Jakob Moosbauer. Flip graphs for matrix multiplication. in preparation."


Deep-learning-hype-hater-quick-take: Based on that title (and a glance at the paper), my guess is that rather than brute force and megawatts of GPU power, they found a first-principles solution.


Their result is "We found a slightly better method after someone told us where to look."

It's very common in mathematics for simpler proofs and extensions of results to follow an initial ground-breaking paper. People looking at this preprint as a refutation of the ML techniques have it exactly backwards.


This seems to be a typical case of the following scenario:

You have been working on a topic for a while. Then, one day, you learn that someone has managed to get a paper on the topic into Nature/Science/whatever. And not because their results are that impressive but because they are from a famous institute and are using fashionable methods. Of course, the first thing you do is checking if you can improve the results with the tools you already have. If you can, you post the results online and start working on a paper.

That happens all the time.


On that Google Nature paper, I'm not sure if the 5x5 matrix improvement is the more significant result, or that they figured out a way to optimize directly to SIMD/AVX instructions, resulting in faster execution even if not necessarily fewer operations


The result is not interesting for ordinary numbers, where the classical ("slower") method is faster. When each element of the 5x5 matrix is itself a matrix, it could be important.

So AVX ops for the method are not interesting.


No... using divide and conquer matrix multiplication with this gives n^2.83. Strassens algorithm is n^2.79. This is good fir 5 by 5 matrices... which are not terribly interesting.


It uses many more additions. When the elements are themselves matrices, additions are a lot cheaper than multiplications, so that is OK. But with ordinary numbers, additions cost about the same as multiplications, so trading off a few multiplications for a lot more additions loses.


I thought it was when you were multiplying two matrices where the elements are 5x5 rather than a 5x5 matrix containing embedded matrices of arbitrary length, no?


The technique works for either


But usefully for only one.


A machine or any algorithm, by definition cannot be first principals. This term was coined with Richard Feynman, and ever since then, in all instances, it has been mis-used.

"No official code found."

There are known faster methods, that as yet are unproven, that will save giga-watts of power.


Hope they get a nature article as well.


So it's a teaser paper?


I wouldn't exactly call it a teaser. The formulas they published stand on their own, and putting them out there in the wake of all the press of the AlphaTensor paper, is just a good idea.

Of course, these formulas are not super interesting on their own, and so the forthcoming paper on HOW they developed these formulas is in some sense the "real" paper.


I think the point is to publish before anyone else does. If they wait until they're full paper is ready, they might be scooped.


I'm not sure how to feel about this. If the purpose of science research is to build the understanding we have, teaser articles seem more like a way to put a name on a creation without actually contributing to our understanding of something.

I guess this is an artifact of how cutthroat academia is and how vital it is to have credit for being first.


> If the purpose of science research is to build the understanding we have.

It's not. It never was. It's always been "to build the understanding that I or this small group I am part of has, that gives us an edge for a while until we share it with the world because it looks like someone else might otherwise pretend to have discovered it".

The idea of a noble science, diligently working towards the betterment of mankind, has always been a romantic fantasy at best.


"The Double Helix" by James Watson is still a surprisingly accessible read that's very relevant here.


I think I'm alright with it in this case. Here, they've given the result and a full proof that a keen undergrad could follow. The future reference is about the method they used to find it. People can hit the ground running with their accelerated matrix multiplication today, should they want to. But if they want to apply that method to find 17x17 multiplication formulae, they'll have to wait.


> I guess this is an artifact of how cutthroat academia is and how vital it is to have credit for being first.

In fairness, academia has improved much in this regards since the days of Newton. That man was such an arse... even the Wikipedia article about his feud with Leibniz glosses over... everything.

Basically, not only was he a donkey about things, he used all sorts of underhanded tactics to win and reveled in devastating his rival's good standing.


But it is a long principle of academia to avoid being scooped by putting out priority-proving work. Consider US patents; or, further back, I read here yesterday that Galileo published anagrams of his results and Kepler took great delight in reverse-engineering them, accidentally finding other true facts other than Galileo's intended discovery.


So long as it doesn't slow or otherwise impede the dissemination of the full paper I don't mind personally.


If they were able to complete it in a week then it is a big deal… especially about wording if that they would be scooped.


Three days! The Deep Mind paper came out on the 5th of October; the first version of this on the 8th.


Are we calling dibs on research now? Is this a kindergarten?


Now?!

Sorry, but someone called dibs on the idea of “priority” long, long ago. The oldest dispute on this list is in the 1500s: https://en.wikipedia.org/wiki/List_of_scientific_priority_di...


Science/academia has been for as long as I can remember.


Read some of the history around natural philosophy some few hundred years ago. It’s a real hoot, how petty and ego-driven some of these people were. Leibniz’s attempts at taking credit for Newton’s hard work come to mind. (Tongue somewhat planted in cheek here.)

Research has always been about getting credit for first discovering something, to some extent.


But now that people know the solution exists, other people can find a justification for it.


I love how advances in modern approaches (deep learning) push for advances in classical approaches. Something similar happened in physics, where advances in quantum algorithms made researchers to find better classical algorithms.

Science is not a competition!


My admittedly less charitable take is that they may have been sitting on this 1 operation optimization for a while, noticed it works on the new method, and only decided to go public now they have seen the positive reception and potential of such collaborations.


More likely, I think it's that the first principles approach they have used to find this optimisation has applications in domains far outside of optimising matrix multiplications.

Only because the Google paper crossed their desk did they even think to try it on matrix multiplications.

And then more or less by luck, it happened to turn out to be slightly better. If it had produced exactly the machine learning algorithm, it would have been less interesting.

The speed of the response makes me feel that the authors are extremely good at their domain, and that finding this algorithm was a reasonably trivial problem for them. The slowness of the follow up paper, I believe is that the authors understand that their mathematical domain is wildly far outside of what most people can easily understand, so they're going to take their time to write it up well, instead of drop a whole bunch of maths definitions that nobody else in the world understands.


The authors were already working on other small-matrix optimizations. They have a 2020 preprint about choosing good pivots for up to 8x8 matrices.

https://arxiv.org/abs/2006.01623


That seems fine to me. What's so sinister about that?


Well there could be an argument that computational saving info should be shared to reduce energy usage but I also think it's fine to sit on it a while.


"Science is not a competition!" - it kind of is though


I'm really interested to know when we will be able to prove that we have the fastest algorithm for some matrix multiplication problem. Is there a theoretical lower bound? (well in this case, for 5x5, there's gonna be an actual lower bound. but for asymptotic cases..?) Is figuring out the theoretical lower bound some kind of undecidable/NP problem? what do we know about it?


A lower bound of O(n^2 log n) has been known for quite some time. Good luck going any tighter than that... https://eccc.weizmann.ac.il/report/2002/012/download/ [pdf]


that's for real and complex fields, and it's not exactly that. I don't think there's anything better than O(n^2) with no conditions


simple proof that you can't get better than O(n^2) is that you need to produce an n x n matrix with O(n^2) distinct values, so you need at least O(n^2) just to print the result.


Not necessarily, assuming two nxn matrices are functionally defined as all 0's printing them takes O(n)


The problem is defined to be over all (real-valued?) nxn matrices. It's fine if your method uses a representation other than the typical row- normal or column-normal form, but its performance is going to be measured over the worst case.


No, it doesn’t. You still need to produce n² zeroes for the result. If you multiply two n × n matrices, the result isn’t a vector; it’s a n × n matrix.


Any reasonable description of the matrix will do. The string "An n by n matrix of all zeroes." can be a good output if correctly defined.


By that logic, every algorithm is O(1). You first limit it to a single input, then you “correctly define” the output for that input using some fixed length string, and you’re golden.


Thus the importance of a good definition 'reasonable'. I can't remember the one in that one course 10 years ago.

However, yes, there are big differences based on the encoding. Nobody would accept using unary input encoding to artificially inflate the size of the input. It's also why the time-compelixity of prime-testing is sometimes confusing.


ah, excellent point


This is a little bit apples and oranges, however. An asymptotic bound on the complexity doesn't say a lot about the number of multiplications needed for a fixed value of n=5 say. A lot of those complexity proofs say "for large enough n".


Not so! The 5x5 multiplication can be applied recursively to NxN matrices -- at the top level the "5x5 multiplication" carves the matrix into a 5x5 grid of "blocks" each of size approximately (N/5 x N/5). It's been a while since I've done this sort of complexity analysis, but I found some nice slides here explaining how to use the so-called Master Theorem for recursive algorithms: https://www.inf.ed.ac.uk/teaching/courses/ads/Lects/lecture3... [pdf]


Oh yeah, great point. Like plugging numbers into the master theorem, if 5x5 could be done in 25 multiplications, say, it would imply multiplication in O(n^2), and even 45 multiplications would give a new asymptotically fast multiplication. I would have to dig out a textbook to figure out what an Omega(n^2 log n) lowerbound would imply for n=5.

edit of course wikipedia links to a explicit lowerbound which is kind of useful for small cases. https://www.sciencedirect.com/science/article/pii/S0885064X0... has it as 2*n*m+2*n-m-2 which is 53 for m=n=5, which is not super tight, but tight enough to say that no 5x5 formula is ever going to lead to an asymptotically faster MM algorithm.


I think there is some educated-believe that it can be done in n^(2+epsilon) but not in n^2. No proof yet ofcourse



Obviously a lower bound is O(N*M), but it's not very useful. But I'm not aware of any tighter lower bound.


This is a bit like the way that Bannister broke a decade-old record with his 4-minute mile, then someone beat his record 6 weeks later.


Even more impressive considering that the universe is expanding. So everyone today has to run farther in even less time.


Matter is not expanding with space, the forces between atoms are much stronger, so planets stay as-is over time. Instead think of space between galaxies expanding.


I think he was joking.


How can both of those be true?


Gravitational binding at a local level is stronger than the metric expansion of space. In the vast voids between galaxies it has a pronounced effect.


If you inflate a balloon with a water droplet on it, it's possible that the balloon becomes 20x its initial size, but the water droplet stays "compact" (and does not increase) due to its surface tension.


You can imagine that the space between matter particles is expanding, but since the binding forces between them are much stronger, they come back to the same relative positions. Only at scales where gravitation/electromagnetic forces have little to no influence you will be able to see things drifting apart


I can't even think of the space between galaxies as they are now.


I don't know that the stretch affects us that way, or in that short of time, in any appreciable manner.


Besides from the reduction from 96 to 95 multiplications it would be interesting to know how have they derived these results.

I think it's more interesting and has more impact to know the methodology they followed than just to know the algorithm.


Aside: what's on with the name of the algorithm? -- does it stand for something?


It's the initials of the authors of the Nature paper


Ah gotcha, thanks!


Pretty terrible name for an algorithm to be honest. Besides being unpronounceable, if the purpose is to acknowledge the discoverers of the algorithm it doesn’t even do a good job of that (who recognizes initials anyway?)


How would one go about finding new ways to multiplicate matrices?

Trial and error? Can the process be automated? Genius gets hit on the head by an apple moment?


For small n, you can reasonably brute-force all (n x n) multiplication formulae in characteristic 2 because there's a naive bound on the number of element multiplications and the coefficients are binary. As n gets large (say, n>3?), the cost of brute-force becomes prohibitive.

Beyond brute-force, I imagine that this problem could be phrased as a max-sat problem, mixed-integer linear program or similar. There are generic solvers that can get solutions for problems of moderate size. But unless it's a fairly rote translation to the solver's native representation, it's often better to write a custom solver and port heuristics into the notation native to the problem. As far as I understand it, that's the approach that the Fawzi et al and TFA took.


I was wondering, why use Deepmind when you can just use SAT solver? I am not saying SAT solvers cannot be improved with AI, maybe they can.


Actually, this has been done: https://arxiv.org/abs/1903.11391


Note my uncertainty about the suitability of a max-sat formulation. But also note who did the Fawzi et al result: the DeepMind team at Google. I think that dogfooding their own stuff is an understandable default.


Virginia Williams teaches an excellent course:

https://people.csail.mit.edu/virgi/6.890/

Lecture notes are open.


Thank you for sharing this! Do you know of similar gems?



Where's the proof that those algorithms works for all 4x4 and 5x5 matrices?

An induction proof or anything else would suffice


The cryptic "5x52" in the title should be $\mathbb{Z}_2^{5\times5}$. Not sure how best to phrase that in plain text, but at least it shouldn't be "5×52".


Z_2^(5x5) might be a better description (or, better, 5x5 matrices of binaries)


Or "5x5 bit matrices".


Something like ℤ₂⁵×⁵?


The multipication sign should be in superscript, but unforunately there's no suitable Unicode character… unless we abuse IPA:

ℤ₂⁵ˣ⁵


Might be abuse of IPA, but 100% stealing this trick for platforms that don’t let me use LaTeX


How?!?


%u2124%u2082%u2075%u02E3%u2075

https://en.wikipedia.org/wiki/Spacing_Modifier_Letters

"Modifier Letter Small X"


Let's try to avoid this by cutting that bit.

Someone will soon point out why that doesn't work, but in the meantime we may gain a few minutes...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: