# A crisis of confidence (intervals)

Lately I’ve taken to exploring some of the aggregated event statistics that we have on file at Flight Data Services — and as a side project, I was using my knowledge of statistics to develop a fairly straightforward anomaly detection algorithm. My first approach was to compute the daily average of a particular key point value (in this case, Acceleration Normal at Touchdown), and then compute the mean and an arbitrary confidence interval (say, 99.5%). Data that fell outside of this interval would then be marked as an anomaly and would warrant further investigation. This is a pretty rudimentary statistical approach to anomaly detection, but one that’s commonly applied in this context — the only problem is that it’s wrong.

# Safety Events and Shopping Baskets

Ever wondered what the link was between flight safety events and shopping baskets? No, me neither. But nonetheless, there is one. Let’s talk about it.

One of our main jobs at FDS (as a flight safety company) is to generate events and alerts when the aircraft that we monitor exceed certain tolerances. For example, a customer might want us to monitor if their pilots regularly fly outside of their allocated altitude range (i.e. a level bust) — which is largely a safety issue, as this is a risk factor for mid-air collisions. Similarly, an airline might want to know if an engine runs too hot for too long (as this will increase maintenance costs and potentially cause safety issues).

# On Data Quality in Flight Data Analytics

One of the core difficulties with analysing flight data is that analytics is very far removed from the initial production and ETL of data. Before any flight data reaches a data scientist, it has to pass through various teams of specialists to be converted into sensible formats for archival. This often leads to a case of “too many cooks” — where too many processing stages make any analytics very fragile, because most of the “interesting” data simply can’t be relied upon to be accurate.

# Why do our aircraft take off slowly on the first of the month?

This was one of the more interesting discoveries of the last few weeks, and I thought that it would make a fun post for the flight data community. As a part of my role as a Data Scientist, it is one of my responsibilities to ensure that the data we record and monitor is of high enough quality to base business decisions on; that is, I have to make sure that our numbers make sense. As a part of an investigation into data quality problems, I was looking into a particular key point value1 called Groundspeed with Gear on Ground Max — which essentially is intended to record the maximum speed that an aircraft reaches when taking off or landing. Using this data, we can theoretically detect situations which may damage the landing gear and reduce fuel efficiency (and then alert the airlines accordingly).

1. We call these “KPVs” for short — essentially sensor values recorded at a

# Rendering Decision Trees in Jupyter Notebook

This is a short memo for myself, really - but it may be useful to someone else. I’ve found rendering decision trees in Jupyter Notebook using the functions that they provide in their documentation annoying. Basically, they use the graphviz library to render the file as an SVG, and then display it in the notebook. Unfortunately, SVG is a vector format and for some reason, Jupyter Notebook doesn’t scale these images nicely - meaning that I kept finding myself scrolling through huge decision tree figures just because they were scaled badly.

# A Guide to Aviation Acronyms for the Bamboozled

This post will act as a reference for some of the most commonly-encountered acronyms used in aviation and flight safety (at least, the ones that I happen across regularly as a data scientist). This is necessary because aviation-y types love their acronyms and it’s a bit perplexing for someone with no domain knowledge.