How AI Discovered a Faster Matrix Multiplication Algorithm

Apr 17, 2024

There is an enigmatic and powerful mathematical operation at work inside everything from computer graphics and neural networks to quantum physics. It's simple enough for high school students to understand, but it's so complex that even experienced mathematicians can't master it. This operation is called

matrix

multiplication

. Matrix

multiplication

is a very fundamental operation in mathematics that appears in many calculations in engineering and physics. A

matrix

is a two-dimensional array of numbers on which operations such as addition and multiplication can be performed. Researchers have long searched for more efficient ways to multiply matrices. So if you do it a little

faster

, bigger problems will arise.

For now, we would say that it is too large to be computable in a reasonable time. However, finding

faster

matrix multiplication methods is a big challenge. But thanks to a new tool, researchers have finally broken an old record for matrix multiplication, which is more than 50 years old. What is your secret weapon? Linear algebra students are taught a method for multiplying matrices based on a centuries-old

algorithm

. Says so. Multiply elements of the first row of matrix A and the first column of matrix B and add them to get the first element of matrix C. Then repeat for the first row of matrix A and the second column of matrix B, and Add them to get second element of matrix C.

More Interesting Facts About,

how ai discovered a faster matrix multiplication algorithm...

And so on. Multiplying two two-by-two matrices in this way requires eight multiplication steps. Multiplying any two N by N matrices with the standard

algorithm

requires N steps cubed. That's why applying this method to larger pairs of matrices quickly becomes unwieldy. If you take a matrix that is twice as large, then the calculation time must be eight times as long. As you can imagine, if you duplicate it a couple of times, then you take eight times as much a couple of times and pretty soon you'll reach the limits of what a computer can do. Enter Volker Strassen, a German mathematician known for his work analyzing algorithms.

In 1969, he

discovered

a new algorithm for multiplying two-by-two matrices that required only seven multiplication steps. Going from eight to seven multiplications may seem trivial, and the new addition steps seem more complicated. But Strassen's algorithm offers spectacular computational savings for larger matrices. This is because when multiplying large matrices, they can be divided into smaller matrices. For example, an eight-by-eight matrix can be reduced to a series of nested two-by-two matrices. Thus, the Strassen savings applied to these smaller matrix multiplications are propagated again and again. Applying Strassen to an eight-by-eight matrix results in one-third fewer multiplication steps compared to the standard algorithm.

For very large arrays, these savings far exceed the computational costs of the additional additions. A year after Strassen invented his algorithm, IBM researcher Shmuel Winograd showed that it was impossible to use six or fewer multiplications to multiply two-by-two matrices, thus also proving that Strassens, with its seven multiplications, is the best solution. For half a century, the most efficient method known for multiplying two matrices of any reasonable size was to decompose them and apply Strassen's algorithm. That was until a new algorithm was revealed in October 2022 that surpassed Strassen's, specifically for multiplying two four-by-four matrices where the elements are only zero or one.

This new algorithm made it possible to multiply large matrices even faster by dividing them into four-by-four matrices instead of two-by-two. So who or what was behind this breakthrough? This new algorithm was

discovered

by Google's artificial intelligence research laboratory, DeepMind. For more than a decade, DeepMind has attracted attention for training artificial intelligence systems to master a host of games, from Atari Pong to chess. Then, in 2016, DeepMind's AlphaGo achieved what was considered impossible at the time: it defeated the top-ranked human Go player, Lee Sedol, in a best-of-five match. This victory shattered the limited notion of what computers can achieve.

Then, DeepMind set its sights on a problem even more challenging than Go. I was very surprised that even in very small cases, we don't even know what is the optimal way to do matrix multiplication. And at some point, we realized that this is actually a very good fit for machine learning techniques to address matrix multiplication. DeepMind started with a descendant algorithm of AlphaGo called AlphaTensor. AlphaTensor is based on a reinforcement learning algorithm called AlphaZero. So what you need to do is really go beyond the AlphaZero reinforcement learning algorithm and address this huge search space and develop techniques to find this, these needles in a very, very big haystack.

AlphaTensor is not the first computer program to help with mathematical research. In 1976, two mathematicians proved using a computer what is called the four-color theorem. The theorem says that you only need four colors to complete any map, so that no neighboring region matches. The pair verified their proof by processing the required 1,936 cases that required more than 1,000 hours of computing time. At that time, the mathematical community at large was unwilling to cede logical reasoning to a machine. However, since then this field has come a long way. AlphaTensor was trained with a technique called reinforcement learning, which is like playing a game.

Reinforcement learning strategically penalizes and rewards an AI system as it experiments with different ways to accomplish the given task, driving the program toward an optimal solution. But what kind of game should AlphaTensor play in pursuit of more efficient matrix multiplication algorithms? This is where the term tensor in AlphaTensor comes into play. A tensor is simply an array of numbers with any number of dimensions. The vectors are 1D tensors and the matrices are 2D tensors. The process of multiplying any two matrices of a given size can be described by a single 3D tensor. For example, by multiplying matrices of two, two by two, we can construct the corresponding 3D tensor.

Each dimension of this cube represents one of the matrices, each element of the cube can be a zero or a negative one. Matrix product C is created by combining elements of matrices A and B, like this. Etc. Until you have the complete matrix multiplication tensor. Now, you can use a process called tensor decomposition to decompose this 3D tensor into building blocks. Similar to taking apart a cube puzzle. A natural way to decompose tensors is into what are called rank 1 tensors, which are just products of vectors. The trick is that each rank 1 tensor here describes a multiplication step in a matrix multiplication algorithm.

For example, this rank 1 tensor represents the first multiplication step in the standard algorithm: A1 times B1. The following rank 1 tensor represents A2 by B3. The sum of these two rank one tensors produces the first element of the product C1. Here are the next two rank 1 tensors, representing the multiplications, A1 times B2 and A2 times B4, which form C2. Finally, the entire standard algorithm with its eight multiplication steps is represented by decomposing the 3D tensor into eight rank 1 tensors. All of these are added back to the original 3D tensor. But it is possible to decompose a 3D tensor in different ways.

Strassen's seven multiplication steps for the same two-by-two matrix multiplication look like this. These rank 1 tensors are more complex. And this is still a complete decomposition, but in fewer steps, adding support to the original tensor. So the fewer rank 1 tensors you use to completely decompose a 3D tensor, the fewer multiplication steps will be used in the matrix multiplication corresponding to the tensor. DeepMind building a single-player game for AlphaTensor to play and learn from was key. Finding an algorithm for this matrix multiplication that requires as few multiplication steps as possible... is a vague request. But it becomes a clearly defined computational task once it is formulated as: decompose this 3D tensor using as few unique rank 1 tensors as possible.

It's really hard to describe what the search space looks like. In the particular case of matrix multiplication, it is quite convenient to formulate it in that space, because then we can implement our search techniques and our machine learning techniques to search in that very, very large, but formalizable search. spaces. AlphaTensor's gameplay was simple. It was programmed to guess rank 1 tensors to subtract them from the original 3D tensor and decompose it to zero. The fewer rank tensors one used, the more rewards one would get. Every rank 1 tensor that you remove from the 3D tensor has a cost, it has a cost of let's say one.

That is why we want to discover what is the way to achieve the objective with the least amount of penalties. And that's what the system is trying to learn to do: learn to estimate, well, when I'm in this kind of setup, approximately how many penalties do I think I'm going to incur before I reach the goal? But tensor decomposition is not an easy game to master. Even for a three-by-three matrix multiplication with only zero or one elements, the number of possible tensor decompositions exceeds the number of atoms in the universe. Still, over the course of its training, AlphaTensor began to identify patterns to decompose the tensor efficiently.

Within minutes he rediscovered Strassen's algorithm. Then the program went even further. He surpassed Strassen's algorithm for multiplying two matrices, four by four, modulo 2, where the elements are only zero or one, breaking the 50-year-old record. Instead of the standard algorithms of 64 multiplication steps or Strassen's 49, AlphaTensor's algorithm used only 47 multiplications. AlphaTensor also discovered thousands of other new fast algorithms, including those for five-by-five matrices modulo-2. So will programs like AlphaTensor, which process in server rooms and extract new mathematical discoveries from lines of code, make mathematicians obsolete? Ultimately, I think this won't replace mathematicians or anything like that. I think this provides a good tool that can help mathematicians find new results and guide their intuition.

That's exactly what happened just days after the AlphaTensor results were first published in the journal Nature. Two mathematicians from Austria, Manuel, Kauers and Jakob Moosbauer, used AlphaTensors' 96-step, five-by-five matrix multiplication algorithm as inspiration to go even further. And then Jakob suggested that we just take the algorithm that the AlphaTensor people found as a starting point and see if there is anything close to this, and we could have an additional drop. And indeed, that was the case. So it was a very short calculation for our new mechanism that was able to reduce 96 to 95. And we later published this in the arxiv article.

The right way to look at this is as a fantastic example of collaboration between a particular type of technology and mathematicians. The true potential of human-AI collaboration is a frontier that is only now being fully explored. I don't think this kind of work can make people irrelevant. I think it allows them to do more.

Watch Video & Subscribe

If you have any copyright issue, please Contact