The Kalman Filter, named after Rudolf E. Kálmán  its
principle mastermind, was developed around 1960. In short, it is an optimal algorithm used to estimate a state of a
linear system by recursively processing
data. To illustrate its importance, our professor constantly reminded us that two things got man
to the moon: Fortran and the Kalman Filter.
To help us understand the algorithm better, let's take a practical example for the rest of this
introduction. Let's imagine our body to be our system, our eyes to be our sensors and our legs
to be our actuators. We are going to walk towards some goal position  a cafe where we will meet
some friends. To make life easier  and this is applicable for most engineering purposes  we
will assume time to be discrete, just like your old 56K dialup used to make it look like. Let's
call each moment of time a timestep. The Kalman Filter is an algorithm that will approximate
our body's location (state) using three things:
 the memory of our state (where we were) in the last timestep
 the model of how our state changes when we give a control input (take a step)
 the model of how our state is represented by a measurement input (take a look)
Of course, we could just depend on our eyes, but consider a scenario where our eyesight is
really, really bad  it has
measurement noise. Similarily, our control input is not
perfect, because we ran a marathon yesterday, and our legs are wobbly. We say our leg motion has
process noise.
What makes the KF filter so amazing is its optimality  i.e., it is mathematically guaranteed to
give the best possible estimate, given these two conditions:
 noise in the data is white,
with a mean of 0, and is normally distributed,
and
 the dynamics of the system whose state we are trying to estimate is linear.
I won't delve into the details of these conditions, I will suffice to say that they are
almost never met, but there are plenty of real circumstances in which we can safely
assume these conditions met.
Derivatives of the KF filter where these conditions are not met have been developed, and they
have been shown to be very successful, although not optimal. Owing to their simplicity and
speed, the family of Kalman Filter algorithms is widely used in a great variety of engineering
applications such as missile tracking, navigation, feature tracking in computer vision, etc.
Here, I will only talk about the basic Kalman filter which was introduced in the seminal paper
by Rudolf E. Kálmán, added on to by a second seminal paper
by Rudolf E. Kálmán and Richard S. Bucy.
What makes the algorithm especially fast is its Markov property, which is the
scientific way of saying everything we need to estimate the current state is stored in the
current action and previous state only. The property means that we do not need any information
about the past further than one step. It also means that we need to guess some initial state to
start. Now, onto the algorithm itself.
The algorithm can be split into two distinct parts:
 Predictor, where the algorithm uses the last state estimate and the current
action to create a prediction of the current state. Imagine closing your eyes,
then taking one single step with your eyes still shut. You can kind of infer where you
are, but can't be very confident about it. We say that this predicition has an a
priori (preliminary) covariance matrix.
 Corrector, where the algorithm will merge the current measurement input with
the prediction output to improve it further. Here, without moving a muscle, you opened
your eyes and used your sight to improve your prediction. We call this an
estimate with an a posteriori (final) covariance matrix.
OK, enough with the oversimplification. Now let's pull the bandaid in one quick painless
motion:
We can define a linear system as: \begin{equation} {x_k = Ax_{k1} + Bu_{k} + w_{k}}
\end{equation} \begin{equation} {z_k = Hx_k + v_k} \end{equation}
 \(x_k\): true state of the system at timestep \(k\)
 \(u_k\): control input intended to change \(x_{k1}\) to \(x_k\)
 \(w_k\): process noise (white, zeromean, normally distributed) with covariance matrix
\(Q\)
 \(z_k\): measurement input at timestep \(k\)
 \(v_k\): measurement noise (white, zeromean, normally distributed) with covariance matrix
\(R\)
 \(A\), \(B\), and \(H\): linear relationships of the system
This gives us the basic jargon we need to define our KF filter as:
\( \textbf{Input: } \) \( \hat{x}_{k1}, P_{k1}, u_k, z_k,
A, B, H, R, Q \)
\( \textbf{Output: } \) \( \hat{x}_{k}, P_{k} \)
 \( \hat{x}^{}_{k} \longleftarrow A\hat{x}_{k1} + Bu_{k}\)
 \( {P}^{}_{k} \longleftarrow AP_{k1}A^{T} + Q \)
 \( K_k \longleftarrow {P}^{}_{k}H^{T}(H{P}^{}_{k}H^{T} + R)^{1} \)
 \( \hat{x}_k \longleftarrow \hat{x}^{}_{k} + K_k(z_k  H\hat{x}^{}_{k}) \)
 \( {P}_{k} \longleftarrow (IK_{k}H){P}^{}_{k} \)

\( \textit{Return }\)\( \hat{x}_k, {P}_{k} \)
where:
 \(\hat{x}^{}_{k}\): prediction of \(x_k\) before incorporating measurements
 \({P}^{}_{k}\): a priori covariance matrix
 \(\hat{x}_k\): estimate of \(x_k\) after incorporating measurements
 \({P}_{k}\): a posteriori covariance matrix
 \(I\): identity matrix
Let me start by saying I will greatly oversimplify some mathematical concepts when explaning
them now, but to really be thorough, you must be familiar with the basics of linear algebra and
probability. If you are familiar with them, the concept will kind of snap right in.
In the beginning, I said that the KF filter will be optimal only for linear systems, and a
linear system is one where its dynamics change, only with multiplications (or divisions) or
additions (or subtractions). This is what equations 1 and 2 describe.
Equation 1 simply defines the state at the current timestep \(x_k\) as the result of:
 the transformation of the previous state \(x_{k1}\) with \(A\). Every step I take with
my leg pushes my entire body in a specific way.
 the transformation of the current input \(u_k\) with \(B\). Pulling or pushing the
muscles in my leg results in my leg taking a step.
 the randomness \(w_k\) with which \(u_k\) will be applied. I aim to take a step of 1 m,
but because of my muscles being inaccurate, I end up stepping 1.01 m.
Similarily, equation 2 defines the measurement at the current timestep \(z_k\) as the result
of:
 the transformation of the current state \(x_{k}\) with \(H\). The view of my world from
my eyes always depends on my height.
 the randomness \(v_k\) with which \(z_k\) will be observed. I see the couch to be 3.2 m
wide, but because my brain is not a ruler, it's actually 2.8 m.
Now that we know how our system is defined, and how our measurement is defined, let's take a
look at how the filter actually works, line by line.
 We will make a prediction \(\hat{x}^{}_{k}\), which is an estimate of the
current state, given just the previous state and the current action. This is the
Predictor part.
 We will calculate the a priori probability \({P}^{}_{k}\) of this estimate being true.
The smaller \(Q\) is (which represents \(w_k\)) and the smaller \(P_{k1}\) is (which is
the probability of my previous final estimation), the smaller \({P}^{}_{k}\) will be.
 Using \({P}^{}_{k}\) we will determine the Kalman Gain \(K_k\), which is the most
important part of the algorithm. Intuitavely, the Kalman Gain tells us how much we
should trust the measurements we take over the predictions we make when we merge them
together.
 We will now multiply the Kalman Gain we got with what is known as the
innovation, \((z_k  H\hat{x}^{}_{k})\), which is the difference between our
prediction and our measurement. Then we will add this to the prediction we got from line
1 \(\hat{x}^{}_{k}\) to get our estimate. This is the Corrector part.
 Now using the Kalman gain \(K_k\) and the a priori probability \({P}^{}_{k}\), we will
calculate the a posteriori probability \({P}_{k}\), which is the final probability of
our final estimate \(\hat{x}_{k}\) being true.
 Now that are done with estimating the state at this current timestep, we will return
the final estimate \(\hat{x}_{k}\) and its final probability \({P}_{k}\), which we will
need to estimate the next state \(\hat{x}_{k+1}\).
That's it! If you've understood everything so far you are ready to get your hands dirty with one
of the two principal technologies that got man to the moon: the Kalman Filter!