Fast Inverse Square Root — A Quake III Algorithm

Jun 08, 2021

In 2005, the game company Its Software opened the engine of its video game Quake 3 Arena in that source code, fans of the game discovered an

algorithm

that was so ingenious that it quickly became famous and the only thing this

algorithm

does is calculate the

inverse

of a

square

root

if I were to write a piece of code that computed the

inverse

of a

square

root

, this is how I would do it here. I'm using the c programming language, the same programming language used for Quake 3. But to be fair, I would. Actually, I myself do not write the square root there, people work more closely with the C language or we see, but the design has already figured out how to calculate the square root and the algorithm provided for us in a math.h file that programmers we can later. just include it in our program, so what could be so interesting about the earth

quake

algorithm 3?

How does your software calculate inverse square roots? At first glance, it doesn't seem to make any sense. Where does this number 0x5f3759df come from? What does it have to do? with extracting square roots and why there is a disgusting curse in the second comment of this video. I'll show you how, with some interesting bit manipulation, you can extract square roots and the algorithm that does this is called

fast

inverse square root. First of all, why would the game engine want to calculate 1 divided by the square root of x? If you want to implement physics and lighting or reflections into the game engine, it's helpful if the vectors you're calculating with are normalized to have a length of 1 because otherwise your vectors may be too short or too long and when You do physics with them, things can go wrong since you all know that the length of a vector is the square root of x squared plus y squared plus z squared if you don't realize I claim you've seen this for two dimensions , it's just the Pythagorean theorem, so if we want to normalize the length of the vector to one, we have to reduce everything by the length of the vector.

More Interesting Facts About,

fast inverse square root a quake iii algorithm...

I mean, obviously, because if we divide the length of the vector by the length of the vector we obviously get one, so all that's left to do is divide x y and z by the length or similarly multiply by one divided by the length. You may already see where this is going by calculating x squared plus y squared plus z squared. is easy and more importantly very

fast

in code, I would implement it as x times x plus y times y plus z times z and all it is is just three multiplications boiled down, additions and multiplications are common operations that have They have been designed to be very fast.

Square root, on the other hand, is a painfully slow operation and division isn't much better. This is not good if we have several thousand surfaces where each has a vector that needs to be normalized, but this also means that we have an opportunity for speed here. It improves if we can find even an approximation of one divided by the square root of x, as long as it's fast, we can save precious time. Fast inverse square root is an approximation with only an error of at most one percent and being three times that. Quickly looking at the code again, we can see that the beginning is quite harmless.

We are given a number called number as input, the number of which we are supposed to take the inverse square root first with the variable i, we declare a 32-bit number and then. We declare two 32-bit decimal numbers x2 and y and then store 1.5 in the variable with the obvious name three halves, the next two lines we simply copy half the input into x2 and the entire input into y, but it's after that where magic arises. happens, take a moment to look at it again, the more you look at it the less sense it makes and the comments on the right aren't really helpful either, but they do hint that there are three steps to this algorithm, putting these three steps together. show us the brilliance of this algorithm, but before we start with these three steps, let's first take a look at binary numbers.

We said that in the first line we declare a 32-bit integer in a C programming language called long, that means we are given. 32 bits and we can represent the number with it, but I think everyone knows how to do it: this is one, two, three, four and so on up to about two billion, but in the next line we declare two decimal numbers in c called floats again . we are given 32 bits and we have to represent the decimal number with them, how would you do that if you and I were designing decimal numbers? This is probably one way we would do it, just put a decimal point in the middle in front of the decimal point we count in the usual way 1 2 3 4 and so on and after the decimal point there are no surprises either just remember this is binary so instead of tenths, hundredths and thousands we have halves quarters eights sixteens and any combination of them like a half and a quarter gives you three quarters, also known as 0.75, but this idea is really terrible.

We've decimated the range of numbers we can represent before we could represent numbers at around 2 billion, now only at around 32,000. Luckily, people. Much smarter than us have found a better way to make use of those 32 bits. They were inspired by scientific notation in the same way that we can systematically represent numbers like 23,000 as 2.3 times 10 to the power of 4 and 0.0034 as 3.4 times 10 to the power of minus. 3 we can also represent them in a binary system where here for example 1 1 0 0 0 could be for example 1.1 times 2 raised to the power of 4. the standard they devised takes the name ieee 754 ieee standard 754 defines the following we are as usual given 32 bits the first bit is the sign bit if it is 0 the number is positive if it is 1 the number is negative but the numbers that shake 3 provide the quick inverse square root are always positive, I mean they are obviously positive if I would have to calculate 1 divided by the square root of -5, something definitely went wrong, so for the rest of this video we ignore the sign bit since it is always 0.

Then the next 8 bits define the exponent which means 2 to the power of 1 2. to the 2 2 to 3 2 to 4 and so on with 8 bits we can represent numbers between 0 and 255 but that's not exactly what we need, we also want negative exponents, that's why everything shifts down by 127. So instead from 2 to the power of 4 we actually have 2 to the power of 4 minus 127 if we really want the exponent to be 4 the bits must be set to 131 because 131 minus 127 is 4. the last 23 bits are the mantissa as is usual in scientific notation we want to denote a digit followed by the comma followed by the decimal places, but with 23 bits we can represent numbers from 0 to 2, but not including 23.

Again, that's not exactly what we need for scientific notation, we need the mantissa. from 1 to 10 or in binary scientific notation to go from one to two so that we can do something we have already done before. Put a comma after the first bit. This automatically gives us numbers one through two, but this naive approach is wasteful. See, the people who designed the 754 standard realized that something happens in binary that doesn't happen in any other base. Look at the first digit in scientific notation, the first digit is by definition always non-zero, but in binary there is only one digit that is not zero. and if we know that the first digit will always be a one, there is no need to store it, so we can save a bit by moving the comma one digit to the left and setting an additional one to the number that our mantissa now represents is between one and two , although 23 bits gave us numbers between 0 and 2, at 23 we reduced them to obtain numbers between 0 and 1 and then added an extra 1 to obtain numbers between 1 and 2. and this is already the main part of the IEEE 754 standard, but only the so-called normalized numbers, the informed viewer knows that the standard also includes denormalized numbers, not an infinite number and two zeros, but we won't go into that because in Quake 3 they happen to never be inputs. in our algorithm, otherwise something definitely went wrong anyway, at no point should our game engine normalize a vector with infinite length for this algorithm and for the rest of this video it will be helpful to think of the mantissa and exponent as numbers binary.

They are, if we are given two numbers, one is the mantissa and 1 is the 23-bit and 8-bit exponent respectively, we can get a bit representation with 2 to the power of 23 times e plus m if you think about it because by multiplying e by 2 to 23 simply shifts e by 23 digits, this is how the bits can be written, but we get the real number behind the bits with this formula. This should sound familiar to you. Here we have the exponent from which we subtract 127 and here we have the mantissa with the extra in front but now something completely different for no obvious reason.

Let's take the logarithm of that expression since we are doing computer science. Here we take the logarithm in base 2. We simplify as much as We can remove the exponent but then we get stuck, but that's not the case. The creators of Earth

quake

developer Gary Tarouli knew a trick to get rid of the logarithm. You see, the trick is an approximation to log 1 plus x for small values of x. log of 1 plus x is approximately equal to x if you think about it, this approximation is actually correct for x equals zero and x equals one, but we'll add an additional term mu. this correction term can be freely chosen again with mu equal to zero.

This approximation is correct at zero and one, but it turns out that setting mu to this number gives the smallest error on average for numbers between zero and one, so going back to our formula we apply our trick since m divided by 2 to the power of 23 is actually a value between 0 and 1. We rearrange a little more and finally see why we did all those calculations m plus e multiplied by 2 until 23 appears, that's our bit representation, so let's think about what we just did : We applied the logarithm to our formula and got the bit representation simply scaled and shifted by some constants, so in a sense the bit representation of a number is its own logarithm.

Armed with this knowledge, we can finally begin the three steps of quick inverse square root. The first step isn't actually complicated, it just seems complicated because it's memory. address trickery, so we store our number in y and now we want to do cool bit manipulation tricks. Unfortunately, floats don't come with the tools we need to manipulate bits. The reason you can't manipulate bits in floats is that they were never designed to do so, streams are inherently tied to the IEEE 754 standard, on the other hand they were designed to manipulate bits in them, for example, here's a trick that shifts to the left, doubles it and a little bit that shifts to the right divides it in half and yes, if your number is odd, you end up rounding, but we are willing to accept such inaccuracies as long as this means that our algorithm is fast c, as almost all programming languages do, provides a way to convert from float to long.

This conversion does it. What most programmers needed to do is convert from a decimal number to an ordinary integer as best they can. If we give it a float value like 3.33, it converts it to an integer, in this case 3, but this is not the conversion we need here. Firstly, we don't care about the resulting integer, we want to somehow maintain our float, and secondly, the bits behind our number get messed up. We don't want this conversion to mess up our bits, we just do. what you need to do is put the bits one by one into a long, the way you achieve this is to convert the memory address not the number, first we get the address of y, this is the address of a float, then you convert that address from a float address to a long address, the address itself does not change, but c now thinks that the number that lives at that address is now together, so you read what is written at that address because c now thinks that this is an address of a long address, it will read the number at that address as if it were long so we trick c by lifting the conversion from the number itself to the address of that number and this is how we place the bits of a number in i don't know what else to say that is how c works, so let's move on to the next step, the intuition behind the second step is this, remember that shifting a number to the left doubles it and shifting it to the right halves it, but what if we did something like this? to an exponent, doubling an exponent squares the number and halving the exponent gives us the square root, but now also negating the exponent gives us 1 divided by the square root of x.

That's exactly what we need, so let's remember what our goal is. Here we have our numbers stored in y and our goal was to calculate 1 divided by the square root of y, as I already said, calculating this directly is too difficult and expensive, but we extracted the bits from y and we have seen it with the ieee standard. 754 than the bits of a numberThey are in a sense their own logarithm, which means that in i we have stored the log of y up to a certain point of scale and displacement. I claim that our problem becomes much easier if we work with logs instead of trying so hard to calculate 1 divided by the square root of and instead calculate the log of 1 divided by the square root of and rewrite this to the log of y raised to minus half so we can get the exponent calculating this is stupidly easy, you might think oh no We have a division there, didn't you say at the beginning that divisions are slow?

Well, yes, but remember that now we can do bit shifts instead of dividing by 2. We just shift a little bit once to the right. This already explains why we do less i. a little bit shifted the ones on the right, but why is this number 0x5f37590f here again? Because our logarithm is actually scaled and shifted, so let's calculate and understand where it comes from, let gamma be our solution, then we know that log of gamma is equal to log of y to the minus half power, which is equal to minus the half by log of and now we replace the logarithm with the bit representation and then we simply solve for the gamma bits.

I'll spare us the details, but this is the result, the magic number. turns out to be the remains of the error term mu the scale factor and the offset now that we have the resolution bits and we can simply reverse the steps of the evil bit trick to recover the actual solution of those bits. Well, actually it's not the exact solution, just an approximation. That's why we need the third step after the previous step. We have a pretty decent approximation, but we picked up a few error terms here and there, but thanks to Newton's method we can make a very good approximation from a decent one.

Newton's method is a technique that finds a root for a given function, meaning it finds an x for which f x equals zero. It does this by taking an approximation and returning a better approximation and normally you repeat this process until you are close enough to the real solution, but it turns out that here we are already close enough to the real solution that one iteration is enough to get an error. Within one percent the only thing that Newton's method needs is the function and its derivative and what Newton's method does is take a value of x and try to guess how far it is from being a root, it does so by calculating f of x and its derivative we can write f of x as y and the derivative as d and over dx we have the relationship between y and the displacement of x and y itself, so to get the displacement of x, the informed viewer can now verify that the last line is one of those Newton iterations applied to the function f of y equals 1 divided by y squared minus x.

Note that y being a root of this function is equivalent to y being the inverse square root of x. I really encourage you to check this last line of code as it is really surprising that even though both the function and the Newton method have a division, the code does not, which means that our algorithm is and remains fast now that we finally understand the quick inverse square root it only took us the knowledge of the ieee 754 standard a trick to circumvent the magic bit operations of the c programming language and the calculus behind newton's method you

Watch Video & Subscribe

If you have any copyright issue, please Contact