A Semi-Gentle Introduction to the Kalman Filter

December 9, 2018

The Kalman Filter, named after Rudolf E. Kálmán - its principle mastermind, was developed around 1960. In short, it is an optimal algorithm used to estimate a state of a linear system by recursively processing data. To illustrate its importance, our professor constantly reminded us that two things got man to the moon: Fortran and the Kalman Filter.

To help us understand the algorithm better, let's take a practical example for the rest of this introduction. Let's imagine our body to be our system, our eyes to be our sensors and our legs to be our actuators. We are going to walk towards some goal position - a cafe where we will meet some friends. To make life easier - and this is applicable for most engineering purposes - we will assume time to be discrete, just like your old 56K dialup used to make it look like. Let's call each moment of time a time-step. The Kalman Filter is an algorithm that will approximate our body's location (state) using three things:

Of course, we could just depend on our eyes, but consider a scenario where our eyesight is really, really bad - it has measurement noise. Similarily, our control input is not perfect, because we ran a marathon yesterday, and our legs are wobbly. We say our leg motion has process noise.

What makes the KF filter so amazing is its optimality - i.e., it is mathematically guaranteed to give the best possible estimate, given these two conditions:

I won't delve into the details of these conditions, I will suffice to say that they are almost never met, but there are plenty of real circumstances in which we can safely assume these conditions met.

Derivatives of the KF filter where these conditions are not met have been developed, and they have been shown to be very successful, although not optimal. Owing to their simplicity and speed, the family of Kalman Filter algorithms is widely used in a great variety of engineering applications such as missile tracking, navigation, feature tracking in computer vision, etc. Here, I will only talk about the basic Kalman filter which was introduced in the seminal paper by Rudolf E. Kálmán, added on to by a second seminal paper by Rudolf E. Kálmán and Richard S. Bucy.

What makes the algorithm especially fast is its Markov property, which is the scientific way of saying everything we need to estimate the current state is stored in the current action and previous state only. The property means that we do not need any information about the past further than one step. It also means that we need to guess some initial state to start. Now, onto the algorithm itself.

The algorithm can be split into two distinct parts:

  1. Predictor, where the algorithm uses the last state estimate and the current action to create a prediction of the current state. Imagine closing your eyes, then taking one single step with your eyes still shut. You can kind of infer where you are, but can't be very confident about it. We say that this predicition has an a priori (preliminary) covariance matrix.
  2. Corrector, where the algorithm will merge the current measurement input with the prediction output to improve it further. Here, without moving a muscle, you opened your eyes and used your sight to improve your prediction. We call this an estimate with an a posteriori (final) covariance matrix.

OK, enough with the over-simplification. Now let's pull the band-aid in one quick painless motion:


We can define a linear system as: \begin{equation} {x_k = Ax_{k-1} + Bu_{k} + w_{k}} \end{equation} \begin{equation} {z_k = Hx_k + v_k} \end{equation}


This gives us the basic jargon we need to define our KF filter as:

    \( \textbf{Input: } \) \( \hat{x}_{k-1}, P_{k-1}, u_k, z_k, A, B, H, R, Q \)
    \( \textbf{Output: } \) \( \hat{x}_{k}, P_{k} \)
  1. \( \hat{x}^{-}_{k} \longleftarrow A\hat{x}_{k-1} + Bu_{k}\)
  2. \( {P}^{-}_{k} \longleftarrow AP_{k-1}A^{T} + Q \)
  3. \( K_k \longleftarrow {P}^{-}_{k}H^{T}(H{P}^{-}_{k}H^{T} + R)^{-1} \)
  4. \( \hat{x}_k \longleftarrow \hat{x}^{-}_{k} + K_k(z_k - H\hat{x}^{-}_{k}) \)
  5. \( {P}_{k} \longleftarrow (I-K_{k}H){P}^{-}_{k} \)
  6. \( \textit{Return }\)\( \hat{x}_k, {P}_{k} \)


Let me start by saying I will greatly oversimplify some mathematical concepts when explaning them now, but to really be thorough, you must be familiar with the basics of linear algebra and probability. If you are familiar with them, the concept will kind of snap right in.

In the beginning, I said that the KF filter will be optimal only for linear systems, and a linear system is one where its dynamics change, only with multiplications (or divisions) or additions (or subtractions). This is what equations 1 and 2 describe.

Equation 1 simply defines the state at the current time-step \(x_k\) as the result of:

  1. the transformation of the previous state \(x_{k-1}\) with \(A\). Every step I take with my leg pushes my entire body in a specific way.
  2. the transformation of the current input \(u_k\) with \(B\). Pulling or pushing the muscles in my leg results in my leg taking a step.
  3. the randomness \(w_k\) with which \(u_k\) will be applied. I aim to take a step of 1 m, but because of my muscles being inaccurate, I end up stepping 1.01 m.

Similarily, equation 2 defines the measurement at the current time-step \(z_k\) as the result of:

  1. the transformation of the current state \(x_{k}\) with \(H\). The view of my world from my eyes always depends on my height.
  2. the randomness \(v_k\) with which \(z_k\) will be observed. I see the couch to be 3.2 m wide, but because my brain is not a ruler, it's actually 2.8 m.


Now that we know how our system is defined, and how our measurement is defined, let's take a look at how the filter actually works, line by line.

  1. We will make a prediction \(\hat{x}^{-}_{k}\), which is an estimate of the current state, given just the previous state and the current action. This is the Predictor part.
  2. We will calculate the a priori probability \({P}^{-}_{k}\) of this estimate being true. The smaller \(Q\) is (which represents \(w_k\)) and the smaller \(P_{k-1}\) is (which is the probability of my previous final estimation), the smaller \({P}^{-}_{k}\) will be.
  3. Using \({P}^{-}_{k}\) we will determine the Kalman Gain \(K_k\), which is the most important part of the algorithm. Intuitavely, the Kalman Gain tells us how much we should trust the measurements we take over the predictions we make when we merge them together.
  4. We will now multiply the Kalman Gain we got with what is known as the innovation, \((z_k - H\hat{x}^{-}_{k})\), which is the difference between our prediction and our measurement. Then we will add this to the prediction we got from line 1 \(\hat{x}^{-}_{k}\) to get our estimate. This is the Corrector part.
  5. Now using the Kalman gain \(K_k\) and the a priori probability \({P}^{-}_{k}\), we will calculate the a posteriori probability \({P}_{k}\), which is the final probability of our final estimate \(\hat{x}_{k}\) being true.
  6. Now that are done with estimating the state at this current time-step, we will return the final estimate \(\hat{x}_{k}\) and its final probability \({P}_{k}\), which we will need to estimate the next state \(\hat{x}_{k+1}\).


That's it! If you've understood everything so far you are ready to get your hands dirty with one of the two principal technologies that got man to the moon: the Kalman Filter!