Introduction

Welcome to the Numerical Algorithms in Rust book! This book is intended for students and programmers who have a solid background in calculus and linear algebra. It is also assumed that the reader has some experience with a programming language, preferably Rust. To learn more about Rust, please refer to the Rust Book.

Why are numerical algorithms important?

Numerical algorithms are used to solve mathematical problems that are difficult or impossible to solve analytically. They are used in many fields of science and engineering, such as Physics, Chemistry, Biology, Economics, and Computer Science. For example, numerical algorithms are used to:

Simulate physical systems
Solve difficult optimization problems
Analyze large datasets
Train machine learning models
Solve differential equations

Throughout this book, we will give real world examples of how numerical algorithms are used in practice. Examples include:

Simulating a pendulum
Understanding heat transfer in a metal rod
Understand growth of a population
Finding the best price for a product given some constraints
Analyzing an audio signal to remove noise
Controlling a robot with a PID controller
Solving the Schrödinger equation for a particle in a box

What will I learn from this book?

This book will teach you how to implement numerical algorithms in Rust. You will learn how to:

Solve nonlinear equations
Solve systems of linear equations
Interpolate and approximate functions
Differentiate and integrate functions
Solve ordinary differential equations
Solve partial differential equations
Optimize functions
Perform fast Fourier transforms
Apply numerical algorithms to real world problems

Most importantly, this book will give you the tools to understand how numerical algorithms work, and how to implement them in a way that is fast, safe, and easy to maintain. You can then specialize and learn more advanced topics and algorithms in your field of interest.

You will also learn about fundamental limitations in how we represent numbers in a computer, and how to avoid common pitfalls when implementing numerical algorithms. Crucially, that includes understanding how to avoid floating point errors and rounding errors, understanding the difference between stability and convergence and also understanding that computers are fundamentally discrete machines, and that we need to use discretization to approximate and solve continuous problems.

Why specifically Rust?

There are many excellent books on numerical methods and algorithms, but most of them are written in C, C++, or Fortran. Rust is a modern programming language that is fast, safe, and expressive. Its rich type system allows us to write code that is easy to read and maintain, and also adopts a mix of principles from procedural, functional, and object-oriented programming paradigms, such as:

Using algebraic data types (ADTs) to model data
Making illegal states unrepresentable
Using pattern matching to handle different cases
Using generics to write reusable code
Using traits to define behavior
Using closures to write higher-order functions
Using iterators to process collections

...and much more. Rust also has a great package manager called Cargo that makes it easy to manage dependencies and build projects. It also has a built-in testing framework that allows us to write unit tests and integration tests. In addition, Rust has a great community that is very welcoming and helpful.

Finally, Rust is a great language for learning numerical algorithms because it is a low-level language that gives us full control over memory management, and allows us to write code that is as fast as C or C++ without sacrificing memory safety.

Introduction to Numerical Analysis

Numerical analysis is an area of applied mathematics that studies algorithms for solving mathematical problems that are difficult or impossible to solve analytically. It also provides answers to questions such as:

How do we know that a numerical algorithm is correct?
What is the error of a numerical algorithm?
If I must have an error less than \(10^{-6}\), how many iterations do I need to perform?

Instead of solving a problem by deriving a formula, in numerical analysis we approximate the solution using more elementary mathematical operations such as addition, multiplication, and division, often in an iterative fashion.

For example, how do we find the square root of a number? In algebra, you might just be satisfied with the analytical, perfect answer \(\sqrt{2}\). But what good is it to a scientist or engineer to know that the square root of 2 is \(\sqrt{2}\)? We need to know the decimal representation of the square root of 2, which, when truncated up to the 11th decimal place is precisely \(1.41421356237\). Not only that, but we need to know a reproducible, step by step procedure to calculate the square root of 2 up to a given precision, which is what numerical analysis is all about.

"But how do we actually calculate the square root of 2 with a decimal representation?", you ask. Well, we can use a numerical algorithm called the Babylonian method to approximate the square root. The Babylonian method is an iterative algorithm that uses the following formula to approximate the square root of a number \(a \in \mathbb{R}_+\):

\[ x_{n+1} = \frac{1}{2} \left( x_n + \frac{a}{x_n} \right) \]

where \(x_0\) is an initial guess, and \(x_n\) is the \(n\) th approximation of the square root of \(a\). We can use this formula to approximate the square root of \(a = 2\):

\[ \begin{align} x_0 &= 1 \\ x_1 &= \frac{1}{2} \left( x_0 + \frac{a}{x_0} \right) = \frac{1}{2} \left( 1 + \frac{2}{1} \right) = \frac{3}{2} = 1.5 \\ x_2 &= \frac{1}{2} \left( x_1 + \frac{a}{x_1} \right) = \frac{1}{2} \left( 3/2 + \frac{2}{3/2} \right) = \frac{17}{12} = 1.41\bar{6} \\ x_3 &= \frac{1}{2} \left( x_2 + \frac{a}{x_2} \right) = \frac{1}{2} \left( 17/12 + \frac{2}{17/12} \right) = \frac{577}{408} = 1.414\overline{2156862745098039} \\ \vdots \\ \sqrt{2} &= 1.4142135623730951\dots \end{align} \]

Note: we use the notation \(\overline{a_1 a_2 \dots a_n}\) to indicate that the sequence of digits \(a_1 a_2 \dots a_n\) repeats infinitely. For example, \(1.\overline{21} = 1.212121\dots\) means that the digits \(21\) repeat infinitely. This is called a repeating decimal, and can be applied to any number system, not just decimal numbers.

It is important to understand that numerical analysis has nothing to do with computers or programming. It is a branch of mathematics that studies the theoretical foundations of numerical algorithms. You could implement a numerical algorithm using pen and paper, for example, to solve a problem that is too difficult to solve analytically.

However, in practice, we use computers to implement numerical algorithms because they allow us to perform complex calculations quickly and accurately. A relatively modern computer can perform billions of operations per second, and can store billions of numbers in memory.

Let's give an example from the early 2010s: the i7-3770k processor from Intel, which was released in 2012, has a base clock speed of 3.5 GHz, 8 cores, with 8 operations per cycle (see slide 26). As a very rough estimate, we can use this formula to calculate the number of floating-point operations per second (flops):

\[ \text{Operations per second} = \text{Clock speed} \times \text{Number of cores} \times \text{Operations per cycle} \]

This gives us a rough estimate of \(3.5 \times 10^9 \times 8 \times 8 = 224 \times 10^9\) operations per second, or 224 gigaflops. That means 224 billion operations every second! Again, this is a very rough estimate, as it grossly overestimates the number of operations per second. In reality, the number of flops is much lower, but it is still a very large number on the order of billions, which no human can match.

Computers are also useful for visualizing the results of a numerical algorithm, and for solving problems that are too large to solve by hand.

Ok, but is this book about numerical analysis or numerical algorithms?

As the title implies, this book is about numerical algorithms, not numerical analysis. The difference is subtle, but important. Numerical analysis is a branch of mathematics that studies the theoretical foundations of numerical algorithms. It is a very broad field that includes topics such as error propagation, stability, convergence, complexity, and discretization. It is a very theoretical field, and it is not the focus of this book. If you are interested in learning more about numerical analysis, or if you want to deeply understand why numerical algorithms work, then you should definitely read books about the topic¹.

To put it simply, this book will focus on the implementation of numerical algorithms in Rust. This includes a hands-on approach with code snippets and examples, and a focus on the practical aspects of implementing numerical algorithms. Again, this is a practical book, so we will not delve deeply into proofs and theorems, but we will discuss the theoretical aspects of these algorithms, though only to the extent that is necessary to understand how to implement them correctly.

What makes numerical algorithms different from other algorithms?

There are many different types of algorithms, such as sorting algorithms, graph algorithms, search algorithms, and so on. What makes numerical algorithms different from regular algorithms, and what makes them noteworthy of study, is the fact that they fundamentally deal with approximations to continuous (and often infinite) mathematical objects, such as real numbers, functions, and derivatives.

Therefore, the difficulty in designing numerical algorithms is that we must deal with the fact that computers are discrete and finite machines, and that we must use discretization to approximate continuous objects. This is a fundamental limitation of computers, and it is important to understand how to deal with it.

Numerical Analysis by Richard L. Burden and J. Douglas Faires, or Numerical Analysis by L. Ridgway Scott are often recommended, but I have not read them myself.

Importance of Numerical Methods

Why are numerical algorithms important?

In a traditional calculus course, you learn how to derive formulas for integrals, ODEs, and PDEs. For example, you learn that the integral of \(\sin x\) is \(-\cos x + C\), and that the solution to the ODE \(\frac{dy}{dx} = y\) is \(y = Ce^x\). However, in practice, we often encounter problems that are too difficult (or impossible) to solve analytically, and therefore we must use numerical methods to approximate the solution. This is one of many scenarios where numerical methods are used in practice.

It is a common misconception to think that numerical methods are exclusively used to solve problems that are too difficult to solve analytically. In fact, numerical methods are used all the time in practice, even when we can solve the problem analytically. The reason for this is that numerical methods are often faster than analytical methods. For example, if we want to find the root of a polynomial, we can use the Newton-Raphson method, which is an iterative method that approximates the root of a function. This method is often faster than solving the polynomial analytically.

Also, sometimes we need the numerical representation of the solution, and not the symbolic representation. For example, when drawing a graph of a function, we need to evaluate the function at many points, and therefore we need to use algorithms to approximate the function up to the resolution of the screen to accurately draw the graph. Again, it is important to remember that there is no such thing as a "continuous" function in a digital computer¹, because computers are discrete machines.

Another big mistake is thinking that numerical methods are inherently inaccurate. In fact, numerical methods can be arbitrarily accurate, in the sense that we can make the solution get as close to the true solution as we want, given enough time and memory. Of course, you wouldn't (and can't) use a 32-bit floating point number to represent the position of a particle in a simulation of the solar system, but that's not because numerical methods are inaccurate, it's because you are using the wrong data type to accurately represent the position of that particle.

In general, numerical methods are suitable for the following mathematical problems:

Problems that are too difficult, impossible or computationally expensive to solve analytically
Problems that require speed, and therefore we are willing to sacrifice precision up to a certain extent
Applications that require a graphical or numerical (instead of symbolic) representation of the solution
Situations where the input data is inherently imprecise, and therefore we neither need nor are able to obtain a precise solution

Case studies

Pendulum

Let's say you want to simulate the motion of a pendulum. Using Newton's laws of motion and gravitation, we can derive the following differential equation:

\[ \frac{d^2 \theta}{dt^2} = -\frac{g}{\ell} \sin \theta \]

where \(\theta\) is the angle of the pendulum, \(g\) is the gravitational acceleration, and \(\ell\) is the length of the pendulum. This is a second-order non-linear ordinary differential equation (ODE), which means that it contains second-order derivatives of the unknown function \(\theta\). This is a very difficult equation to solve analytically, and therefore we must use numerical methods to simulate the motion of the pendulum.

It gets even more complicated if we add a friction term to the equation:

\[ \frac{d^2 \theta}{dt^2} = -\frac{g}{\ell} \sin \theta - \frac{\mu}{m} \frac{d \theta}{dt} \]

where \(\mu\) is the friction coefficient, and \(m\) is the mass of the pendulum.

N-body problem

Let's say you want to precisely predict the trajectory of a probe going from the Earth to Jupyter. Evidently, Newton's laws of motion and gravitation are at play in this physical system, but unlike two-body systems, the solar system contains many bodies which interact with each other through gravity². This means that the equations of motion are very complex, and impossible to solve analytically. Therefore, we must use numerical methods to simulate the system and predict the trajectory of the probe.

Computer graphics

The field of computer graphics is a great example of how numerical methods are used in practice. For example, when drawing a 3D scene, we need to project the 3D coordinates of the objects onto a 2D screen. This is done using a technique called perspective projection, which is a numerical algorithm that approximates the projection of a 3D point onto a 2D screen.

Linear algebra

Linear algebra is another field where numerical methods are used extensively. For example, when solving a system of linear equations, we can use the Gaussian elimination algorithm to find the solution. This algorithm is often faster than solving the system directly, and it is also more numerically stable.

Conclusion

These are just some of many instances where numerical methods are used in practice. In fact, numerical methods are used all throughout science and engineering. TODO

You might say "what about the FCOS instruction in the x86 ISA?". Well, that's not a continuous function, as it implements a numerical algorithm to approximate the cosine of a number.

A simpler version of the problem is the Three Body Problem, which consists of three bodies (e.g. the Sun, the Earth, and the Moon) interacting with each other through gravity. This problem is also impossible to solve analytically, and therefore we must use numerical methods to simulate the system.

How computers represent numbers

The binary number system

Computers represent numbers using the binary number system. This means that all numbers are represented using only two digits: 0 and 1. For example, the number 42 is represented as 101010 in binary.

Mathematically, we can represent a number \(x\) in base \(b\) using the following formula:

\[ x = \sum_{i=0}^n a_i b^i \]

where \(a_i\) is the \(i\) th digit of the number, and \(n\) is the number of digits in the number. For example, the number 42 in base 10 is represented as:

\[ 42 = 4 \times 10^1 + 2 \times 10^0 \]

In binary, the number 42 is represented as:

\[ 42 = 1 \times 2^5 + 0 \times 2^4 + 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 = 101010 \]

This is called the positional notation of a number, because the position of each digit determines its value. For example, the first digit from the right is multiplied by \(b^0\), the second digit is multiplied by \(b^1\), and so on.

This is probably not new to you, but once we introduce negative numbers to complete the integers \(\mathbb{Z}\), fractions to complete the rational numbers \(\mathbb{Q}\), and irrational numbers to complete the real numbers \(\mathbb{R}\), things get a little more complicated, and we need to make decision with fundamental consequences and trade-offs.

For example, do we want arbitrary precision? if so, we may need to grow our numbers indefinitely. If we want to represent fractions exactly, we can use a tuple of integers \((a, b)\), with \(b \neq 0\) to represent the fraction \(\frac{a}{b}\). We will soon see, though, that this is not a good idea, because it is not possible to represent all fractions exactly using a finite number of bits, and performing calculations with these fractions might make the numbers grow indefinitely.

It is often the case that we want speed, and therefore we need to make trade-offs between speed and precision. For example, we can use a fixed number of bits to represent a number, but this means that we can only represent a finite amount of numbers, and that we will lose precision when performing calculations.

Limitations of computer memory

Unlike in mathematics, where we can represent numbers with arbitrary precision, computers have a limited amount of memory, and therefore can only represent up to a finite quantity of numbers. To put this mathematically, when we say that a type T has \(n\) bits, we mean that it can represent up to \(2^n\) different numbers. For example, the u8 type in Rust has 8 bits, and therefore can represent up to \(2^8 = 256\) different numbers.

Not only that, but modern computer architectures have a memory hierarchy, which means that some types of memory are faster than others. For example, registers are the fastest type of memory, followed by cache, then RAM, and finally disk.

Registers are the fastest type of memory because they are located directly on the CPU, and are therefore the fastest to access. However, registers are also the most expensive type of memory, and therefore there are only a few of them.
Cache is the next fastest type of memory, and there is more of it than registers, but it is still expensive.
RAM is the next fastest type of memory, and there is more of it than cache, but it is still expensive.
Finally, disk is the slowest type of memory, but there is a lot of it, and it is cheap.

This hierarchy can be represented by the following diagram:

registers > cache > RAM > disk

In general, processes running on a computer will use registers and cache to store data that is frequently accessed, and will use RAM to store data that is not frequently accessed. The disk is used to store data that is not currently being used, but may be used in the future.

As a concrete example, in the x86-64 instruction set architecture, there are 16 general purpose registers, each of which can store 64 bits of data. To store a single 64-bit floating point number, we can load that number into a register, and then perform operations on it.

In Rust, we can use primitive data types to represent numbers. For example, we can use the i32 type to represent a 32-bit integer, or the f64 type to represent a 64-bit floating point number.

#![allow(unused)]
fn main() {
let x: i32 = 42;
let y: f64 = 3.14;
}

These numbers can fit into a single register, and therefore can be processed very quickly. For example, when you perform an addition operation on two 32-bit integers, the CPU will load the two numbers into registers, perform the addition, and then store the result back into a register. Let's see this in action:

#![allow(unused)]
fn main() {
pub fn add(x: i32, y: i32) -> i32 {
    x + y
}
}

The generated x86-64 assembly for this function is:

add:
    push    rax
    add     edi, esi
    mov     dword ptr [rsp + 4], edi
    seto    al
    test    al, 1
    jne     .LBB1_2
    mov     eax, dword ptr [rsp + 4]
    pop     rcx
    ret
; *other parts are ommitted*

There's a lot of noise there, but the important part is this:

add     edi, esi

This is the actual addition operation, and it is performed on two registers: edi and esi. The result is then stored back into edi. Then Rust performs some additional operations to check for overflow, and then returns the result.

But what about bigger numbers? For example, what if we want to represent the number \(2^{64}\)? This number is too big to fit into a single register, and therefore we need to develop a data structure that can represent arbitrarily large numbers. This is called a big integer. Solutions like this are often called software implementations of numbers because they are implemented in software, and not in hardware. For now, we will not discuss software implementations of numbers, but we will discuss them in more detail in a later chapter. (TODO)

The main takeaway is this: we prefer using the primitive types because they directly map to hardware implementations of numbers, and therefore are very fast. However, they are limited in size, which very often will create problems like overflow, underflow, and loss of precision.

Integers

Unsigned Integers

As briefly discussed in the previous section, unsigned integers are represented using the unsigned binary number system. In Rust, we can use the u8, u16, u32, u64, and u128 types to represent unsigned integers with 8, 16, 32, 64, and 128 bits respectively.

Signed Integers

Signed integers are represented using the two's complement representation. In Rust, we can use the i8, i16, i32, i64, and i128 types to represent signed integers with 8, 16, 32, 64, and 128 bits respectively.

To represent negative numbers, we use the most significant bit (MSB) as a sign bit. If the MSB is 0, then the number is positive. If the MSB is 1, then the number is negative. This can be understood by using the positional notation, and noting that in two's complement the MSB is the only negative digit. Thus, for \(n\) bits, we have:

\[ x = -a_{n-1} 2^{n-1} + \sum_{i=0}^{n-2} a_i 2^i \]

The remaining bits are used to shift the number up. For example, the number -42 is represented as 11010110 in binary using 8 bits. The MSB is 1, which means that the number is negative (i.e. shifted down by \(2^7 = 128\)). The remaining bits are 1010110, which is the binary representation of \(2 + 4 + 16 + 64 = 86\). We then add \(-1 \times 128 + 86 = -42\). Or, in the positional notation:

\[ -42 = -1 \times 2^7 + 1 \times 2^6 + 1 \times 2^4 + 1 \times 2^2 + 1 \times 2^1 = -128 + 64 + 16 + 4 + 2 \]

Let's see this in action (you can run the code by pressing the play button ):

#![allow(unused)]
fn main() {
let x: i8 = -42;
println!("{:b}", x); // prints "11010110"
}

Rust also provides the usize and isize types, which are architecture-dependent and represent the size of a pointer. For example, on a 64-bit architecture, the usize type is equivalent to u64, and the isize type is equivalent to i64. They are also often used to represent the size of a collection, or the index of an element in a collection:

#![allow(unused)]
fn main() {
let word = "hello";
let n = word.len(); // n is of type usize
let values = [1, 2, 3, 4, 5];
let first = values[0]; // first is of type i32
let m = values.len(); // m is of type usize
for i in 0..m {
    println!("{}", values[i]);
}
}

As a simple comparison, the following table shows the range of values that can be represented by some of the integer types:

Type	Minimum value	Maximum value
`u8`	0	255
`i8`	-128	127
`u16`	0	65535
`i16`	-32768	32767
`u32`	0	4294967295
`i32`	-2147483648	2147483647