The Kalman Filter, named after
Rudolf E. Kálmán  its principle
mastermind, was developed around 1960. In short, it is an optimal
algorithm used to estimate a state of a
linear system by
recursively
processing data. To illustrate its importance, our professor constantly reminded us that two things
got man to the moon: Fortran and the Kalman
Filter.
To help us understand the algorithm better, let's take a practical example for the rest of this
introduction. Let's imagine our body to be our system, our eyes to be our sensors and our legs to be
our actuators. We are going to walk towards some goal position  a cafe where we will meet some
friends. To make life easier  and this is applicable for most engineering purposes  we will assume
time to be discrete, just like your old 56K dialup used to make it look like. Let's call each
moment of time a timestep. The Kalman Filter is an algorithm that will approximate our body's location
(state) using three things:
 the memory of our state (where we were) in the last timestep
 the model of how our state changes when we give a control input (take a step)
 the model of how our state is represented by a measurement input (take a look)
Of course, we could just depend on our eyes, but consider a scenario where our eyesight is really,
really bad  it has
measurement noise. Similarily, our control input is not perfect, because
we ran a marathon yesterday, and our legs are wobbly. We say our leg motion has
process noise.
What makes the KF filter so amazing is its optimality  i.e., it is mathematically guaranteed to
give the best possible estimate, given these two conditions:
 noise in the data is white, with a
mean of
0, and is normally distributed,
and
 the dynamics of the system whose state we are trying to estimate is linear.
I won't delve into the details of these conditions, I will suffice to say that they are almost never
met, but there are plenty of real circumstances in which we can safely
assume these conditions
met.
Derivatives of the KF filter where these conditions are not met have been developed, and they have been
shown to be very
successful, although not optimal. Owing to their simplicity and speed, the
family of Kalman Filter algorithms is widely used in a great variety of engineering applications such
as missile tracking, navigation, feature tracking in computer vision, etc. Here, I will only talk about
the basic Kalman filter which was introduced in the seminal
paper
by Rudolf E. Kálmán, added on to by a second seminal
paper
by Rudolf E. Kálmán and Richard S. Bucy.
What makes the algorithm especially fast is its
Markov property, which is the scientific
way of saying everything we need to estimate the current state is stored in the current action and
previous state only. The property means that we do not need any information about the past further
than one step. It also means that we need to guess some initial state to start. Now, onto the algorithm
itself.
The algorithm can be split into two distinct parts:
 Predictor, where the algorithm uses the last state estimate and the current action to
create a prediction of the current state. Imagine closing your eyes, then taking one
single step with your eyes still shut. You can kind of infer where you are, but can't be very
confident about it. We say that this predicition has an a priori (preliminary)
covariance matrix.
 Corrector, where the algorithm will merge the current measurement input with the
prediction output to improve it further. Here, without moving a muscle, you opened your eyes
and used your sight to improve your prediction. We call this an estimate with an a
posteriori (final) covariance matrix.
OK, enough with the oversimplification. Now let's pull
the bandaid in one quick painless
motion:
We can define a linear system as:
\begin{equation}
{x_k = Ax_{k1} + Bu_{k} + w_{k}}
\end{equation}
\begin{equation}
{z_k = Hx_k + v_k}
\end{equation}
 \(x_k\): true state of the system at timestep \(k\)
 \(u_k\): control input intended to change \(x_{k1}\) to \(x_k\)
 \(w_k\): process noise (white, zeromean, normally distributed) with covariance matrix \(Q\)
 \(z_k\): measurement input at timestep \(k\)
 \(v_k\): measurement noise (white, zeromean, normally distributed) with covariance matrix \(R\)
 \(A\), \(B\), and \(H\): linear relationships of the system
This gives us the basic jargon we need to define our KF filter as:
\( \textbf{Input: } \) \( \hat{x}_{k1}, P_{k1}, u_k, z_k, A, B, H, R, Q \)
\( \textbf{Output: } \) \( \hat{x}_{k}, P_{k} \)
 \( \hat{x}^{}_{k} \longleftarrow A\hat{x}_{k1} + Bu_{k}\)
 \( {P}^{}_{k} \longleftarrow AP_{k1}A^{T} + Q \)
 \( K_k \longleftarrow {P}^{}_{k}H^{T}(H{P}^{}_{k}H^{T} + R)^{1} \)
 \( \hat{x}_k \longleftarrow \hat{x}^{}_{k} + K_k(z_k  H\hat{x}^{}_{k}) \)
 \( {P}_{k} \longleftarrow (IK_{k}H){P}^{}_{k} \)

\( \textit{Return }\)\( \hat{x}_k, {P}_{k} \)
 \(\hat{x}^{}_{k}\): prediction of \(x_k\) before incorporating measurements
 \({P}^{}_{k}\): a priori covariance matrix
 \(\hat{x}_k\): estimate of \(x_k\) after incorporating measurements
 \({P}_{k}\): a posteriori covariance matrix
 \(I\): identity matrix
Let me start by saying I will greatly oversimplify some mathematical concepts when explaning them now,
but to really be thorough, you must be familiar with the basics of linear algebra and probability. If
you are familiar with them, the concept will kind of snap right in.
In the beginning, I said that the KF filter will be optimal only for linear systems, and a linear
system is one where its dynamics change, only with multiplications (or divisions) or additions (or
subtractions). This is what equations 1 and 2 describe.
Equation 1 simply defines the state at the current timestep \(x_k\) as the result of:
 the transformation of the previous state \(x_{k1}\) with \(A\). Every step I take with my leg
pushes my entire body in a specific way.
 the transformation of the current input \(u_k\) with \(B\). Pulling or pushing the muscles in
my leg results in my leg taking a step.
 the randomness \(w_k\) with which \(u_k\) will be applied. I aim to take a step of 1 m, but
because of my muscles being inaccurate, I end up stepping 1.01 m.
Similarily, equation 2 defines the measurement at the current timestep \(z_k\) as the result of:
 the transformation of the current state \(x_{k}\) with \(H\). The view of my world from
my eyes always depends on my height.
 the randomness \(v_k\) with which \(z_k\) will be observed. I see the couch to be 3.2 m wide,
but because my brain is not a ruler, it's actually 2.8 m.
Now that we know how our system is defined, and how our measurement is defined, let's take a look at
how the filter actually works, line by line.
 We will make a prediction \(\hat{x}^{}_{k}\), which is an estimate of the
current state, given just the previous state and the current action. This is the Predictor
part.
 We will calculate the a priori probability \({P}^{}_{k}\) of this estimate being true.
The smaller \(Q\) is (which represents \(w_k\)) and the smaller \(P_{k1}\) is (which is the
probability of my previous final estimation), the smaller \({P}^{}_{k}\) will be.
 Using \({P}^{}_{k}\) we will determine the Kalman Gain \(K_k\), which is the most
important part of the algorithm. Intuitavely, the Kalman Gain tells us how much we should trust
the measurements we take over the predictions we make when we merge them together.
 We will now multiply the Kalman Gain we got with what is known as the innovation,
\((z_k  H\hat{x}^{}_{k})\), which is the difference between our prediction and our
measurement. Then we will add this to the prediction we got from line 1 \(\hat{x}^{}_{k}\)
to get our estimate. This is the Corrector part.
 Now using the Kalman gain \(K_k\) and the a priori probability \({P}^{}_{k}\), we
will calculate the a posteriori probability \({P}_{k}\), which is the final probability of our
final estimate \(\hat{x}_{k}\) being true.
 Now that are done with estimating the state at this current timestep, we will return the
final estimate \(\hat{x}_{k}\) and its final probability \({P}_{k}\), which we will need to
estimate the next state \(\hat{x}_{k+1}\).
That's it! If you've understood everything so far you are ready to get your hands dirty with one of the
two principal technologies that got man to the moon: the Kalman Filter!