The Joy of Testing

Unit testing is great because it helps us make better software. A good test suite will catch bugs, maintain stability, measure performance, and more.

Yet to be honest, those aren’t the reasons I like testing so much. I like writing tests because it just feels great to run them. Sometimes I catch myself watching the Amatino API test suite running, not because I need to test changes, but because I simply want to watch it run.

At upper left, we have the HTTP returns for the public-facing API. At lower left, the internal API. At right, the list of successful tests marching downwards.

It is the closest I can be to watching the machine think. All the thousands of lines of code, the weeks of agonising over tough problems, the frustrations of insidious bugs, the joys of successful output. All distilled into a real-time ballet of virtual machines, database queries, http requests, and code execution that I can see with my own eyes.

Write tests. Not because the bloggerverse told you to, but because it just feels good.

How to install OpenSSL & stunnel on MacOS

When travelling behind the Great Firewall of China,  I wanted a copy of OpenSSL and stunnel on my machine. Googling “install stunnel macos” gives a bunch of answers that involve the word “brew”.

OpenSSL and stunnel are open-source C programs, which means we can compile them from source. Doing so is not difficult, but it is a bit fiddly, and I think that fiddliness can dissuade people and cause them to unnecessarily reach for a bloated package manager.

All the information in this blog post is “as of February 2019”, and applies to stunnel 5.50, OpenSSL 1.1.1, and macOS 10.14.

stunnel depends on OpenSSL, so we will compile and install OpenSSL first.

Compiling & Installing OpenSSL on MacOS

stunnel is going to look for ssl.h, and to make it available we need to compile with the shared flag.  We probably don’t want to spray OpenSSL all over the system, so we will use --prefix to specify an install location other than the default /usr/bin.

$ ./Configure darwin64-x86_64-cc shared --prefix=/Users/hugh/somedir --openssldir=/Users/hugh/somedir no-ssl3

Now you can make and make install. Many lines of output later, you will

Compiling & Installing stunnel on MacOS

OpenSSL at the ready, we can now move on to stunnel.   We will tell stunnel about our newly minted copy of OpenSSL using the ./configure command:

$ ./configure --with-ssl=/Users/hugh/somedir --prefix=/Users/hugh/somedir

If the shared flag was not specified when compiling OpenSSL, then this is where you will hit the much-googled:

checking for TLS directory... not found
configure: error: 
Could not find your TLS library installation dir
Use --with-ssl option to fix this problem

We did used shared, so we are good to go! Hit that make button, followed by make install, and you are the proud owner of a copy of stunnel compiled with OpenSSL!

Immutable objects in Python

Python is a beautifully flexible beast. Class properties are variable, which means we can change them from anything to anything at any time.

class Flexible:
    piece = "hello world"

instance = Flexible()
print(instance.piece)  # prints “hello world”
instance.piece = 42
print(instance.piece)  # prints “42”

Sometimes, we might want to trade some flexibility for safety. Humans are fallible, forgetful, and fickle beasts. Programmers are also humans. We make mistakes more often than we would like to admit.

Fortunately, Python gives us the tools protect ourselves against ourselves. Where we want to, we can trade flexibility for safety. You might wish to protect yourself by creating immutable objects : Instances of a class that can’t be modified once they are created.

In this article, we will seek immutability of properties. That is, we will stop ourselves from being able to change the.piece property of a Flexibleclass.

By making our class properties immutable, we eliminate the need to reason about object state. This reduces our cognitive load, and thus the potential for error.

Note that the immutability in this context is different to immutability  from the perspective of memory, an equally valuable but different angle on the broad topic of immutability in general.

Our objective is to achieve immutability from the perspective of the programmer – To explicitly catch cases were we accidentally attempt to mutate a property that we should not. From the perspective of the machine, the property is still perfectly mutable. We aren’t trying to change the way the property behaves in memory, we are trying to protect ourselves from our own stupidity.

To create an immutable property, we will utilise the inbuilt Python property class. property allows us to define get and set behaviour for a property.

class Flexible:
   piece = property(lambda s: "hello world"w)

instance = Flexible()
print(instance.piece)  # prints “hello world”
Instance.piece = mutated  # throws AttributeError

The property class takes four parameters. The two we will focus on here are fget and fset. In the above example, lambda s: “hello world” was our fget, allowing us to print(instance.piece). The absence of fsetcaused the AttributeError when we attempted to set the value of instance.piece to ’mutated’.

An AttributeError might be a solid enough reminder to yourself that you’ve accidentally done something dumb. However, you might be working on a project with multiple programmers. Perhaps an AttributeError is not a clear enough warning to others that a property should not change.

For example, a colleague might interpret that AttributeError as a sign that you simply forgot to implement fset. They might merrily edit your class, adding fset, unknowingly opening a Pandora’s Box of state-related bugs.

To give our colleagues as much information as possible, let’s make the immutability explicit. We can do so by subclassing property.

class Immutable(property):
   _MESSAGE = "Object state must not be mutated"

   def __init__(self, get) -> None:

   def self._set_error(self, _1, _2) -> None:
       raise RuntimeError(self._MESSAGE)

Now, when we attempt to change a property, we get a clear and unambiguous error.

class Flexible:
   piece = Immutable(lambda s: "Can't touch this")

instance = Flexible()
instance.piece = "try me"  # Raises RuntimeError with clear description

Of course, a lambda serving a constant is not going to satisfy many requirements. You can supply the fget parameter something more useful. For example, suppose a class maintains some internal state, readable by the whole program. It is crucial to the safe operation of the program that nothing outside the class modifies that state.

class Flexible:
   _internal_state = 42
   some_state = Immutable(lambda s: s._internal_state)

In this case, the rest of the program can safely access the value of _internal_state via the some_state property. We provide a strong hint to our colleagues that _internal_state is off limits by using the leading underscore: A convention for hinting that a variable be treated as “private”. The value returned by some_state can be changed internally by the class, but it is very hard for a programmer to accidentally modify the state externally.

Other languages might achieve this behaviour in other ways, especially through the use of the private keyword. For example, in Swift:

class Flexible {
   public private(set) var some_state = 42

Unlike Swift and others, Python will not explicitly stop someone from modifying the Flexible state. For example, a colleague could easily execute

instance._internal_state = "where is your god now?"

That flexibility is a great strength of Python. The point is not to stop anyone doing anything. The point is to provide helpful hints, checks, and clues to stop ourselves from making silly mistakes.

Result types in Swift. Are they useless?

Fresh-faced, amateur, and impressionable: Swift is not my main jam. When setting out to write Amatino Swift, I was hungry for best-practice and good patterns. Amatino Swift involves lots of asynchronous HTTP requests to the Amatino API.

Asynchronous programming requires bifurcation at the point of response receipt. That is, an asynchronous operation may yield a success state or a failure state. Result types are a pattern widely espoused as a pattern for handling such bifurcation.

I’d like to open a debate over whether result types should be used in Swift. After some thought, it appears to me that they are useless. I propose that we would be better off encouraging newbies to utilise existing Swift features, rather than learning about and building result type patterns.

For the purposes of this discussion, let’s assume that our code includes two functions, each of which handle a bifurcated state:

func handleSuccess(data: Data) {
  // Do stuff with data

func handleFailure(error: Error) {
  // Do stuff with error

Inside these functions, we might implement code which is independent of our bifurcation pattern. For example, we could `case-switch` on Error type in order to present a meaningful message to a user.

Now to the bifurcation itself. A naive and simple pattern might be:

// This is bad code. Do not copy it!
func asynchronousCallback(error: Error?, data: Data?) -> Void {
  if (error != nil) {

There are myriad problems with this approach. We have no type safety over `data`. We do not control for cases where programmer error yields a state where both `data` and `error` are `nil`. It’s ugly. More.

Result types propose to improve upon this pattern by defining a type such as:

enum Result<Value> {
  case success(Value)
  case failure(Error)

Which may be used like so:

func asynchronousCallback(result: Result<Data>) {
  switch result {
  case .success(let data):
  case .failure(let error):

This pattern introduces type safety to both `error` and `data`. I suggest that it does so at too great a cost when compared to using inbuilt Swift features. Every asynchronous bifurcation now requires a `switch-case` statement, and the use of a result type.

Compare the result type pattern with one that uses the Swift `guard` statement:

func asynchronousCallback(error: Error?, data: Data?) {
  guard let data = data else { handleError(error ?? TrainWreck()) }; return

In this case, we have type safety over `error` and `data`. We have handled a case in which a programmer failed to provide an `Error` using the nil-coalescing operator `??`. We have done it all in two lines of less than 80 char. A suitable error type might be defined elsewhere as:

struct TrainWreck: Error { let description = "No error provided" }

Bifurcation via a `guard` statement appears to me to have several advantages over result types:

  • Brevity. Functions handling asynchronous callbacks need only implement a single line pattern before proceeding with a type safe result.
  • Lower cognitive load. A developer utilising a library written with the `guard` pattern does not need to learn how the library’s result type behaves.
  • Clarity. A `guard` statement appears to me to be more readable than a `case-switch`. This is subjective, of course.

What do you think? I am not a Swift expert. Am I missing something obvious? Why would you choose to use a result type over a `guard` statement?

Cover image – A bee harvests pollen from lavender on Thornleigh Farm

Originally posted at The Practical Dev

Asynchronous Object Initialisation in Swift

Baby birds, rockets, freshly roasted coffee beans, and … immutable objects. What do all these things have in common? I love them.

An immutable object is one that cannot change after it is initialised. It has no variable properties. This means that when using it in a program, my pea brain does not have to reason about the state of the object. It either exists, fully ready to complete its assigned duties, or it does not.

Asynchronous programming presents a challenge to immutable objects. If the creation of an object requires network I/O, then we will have to unblock execution after we have decided to create the object.

As an example, let’s consider the Transaction class inside Amatino Swift. Amatino is a double entry accounting API, and Amatino Swift allows macOS & iOS developers to build finance capabilities into their applications.

To allow developers to build rich user-interfaces, it is critical that Transaction operations be smoothly asynchronous. We can’t block rendering the interface while the Amatino API responds! To lower the cognitive load yielded by Amatino Swift, Transaction should be immutable.

We’ll use a simplified version of Transaction that only contains two properties: transactionTime and description. Let’s build it out from a simple synchronous case, to a full fledged asynchronous case.

class Transaction {
  let description: String
  let transactionTime: Date 
  init(description: String, transactionTime: Date) {
    self.description = description
    self.transactionTime = transactionTime

So far, so obvious. We can instantly initialise Transaction. In real life, Transaction is not initialised with piecemeal values, it is initialised from decoded JSON data received from an HTTP request. That JSON might look like this:

  "transaction_time": "2008-08",
  "description": "short Lehman Bros. stock"

And we can decode that JSON into our Transaction class like so:

/* Part of Transaction definition */
enum JSONObjectKeys: String, CodingKey {
  case txTime = "transaction_time"
  case description = "description"

init(from decoder: Decoder) throws {
  let container = try decoder.container(
    keyedBy: JSONObjectKeys.self
  description = try container.decode(
    forKey: .description
  let dateFormatter = DateFormatter()
  dateFormatter.dateFormat = "yyyy-MM" //...
  let rawTime = try container.decode(
    forKey: .txTime
  guard let txTime: Date =
    from: rawTime
  ) else {
    throw Error
  transactionTime = txTime

Whoah! What just happened! We decoded a JSON object into an immutable Swift object. Nice! That was intense, so lets take a breather and look at a cute baby bird:

Break time is over! Back to it: Suppose at some point in our application, we want to create an instance of Transaction. Perhaps a user has tapped ‘save’ in an interface. Because the Amatino API is going to (depending on geography) take ~50ms to respond, we need to perform an asynchronous initialisation.

We can do this by giving our Transaction class a static method, like this one:

static func create(
  description: String,
  transactionTime: Date,
  callback: @escaping (Error?, Transaction?) -> Void
) throws {
  /* dummyHTTP() stands in for whatever HTTP request
     machinery you use to make an HTTP request. */
  dummyHTTP() { (data: Data?, error: Error?) in
    guard error == nil else { 
      callback(error, nil)
    guard dataToDecode: Data = data else {
      callback(Error(), nil)
    let transaction: Transaction
    guard transaction = JSONDecoder().decode(
      from: dateToDecode
    ) else {
      callback(Error(), nil)
    callback(nil, transaction)

This new Transaction.create() method follows these steps:

  1. Accepts the parameters of the new transaction, and a function to be called once that transaction is available, the callback(Error?:Transaction?). Because something might go wrong, this function might receive an error, (Error?) or it might receive a Transaction (Transaction?)
  2. Makes an HTTP request, receiving optional Data and Error in return, which are used in a closure. In this example, dummyHTTP() stands in for whatever machinery you use to make your HTTP requests. For example, check out Apple’s guide to making HTTP requests in Swift
  3. Looks for the presence of an error, or the absence of data and, if they are found, calls back with those errors: callback(error, nil)
  4. Attempts to decode a new instance of Transaction and, if successful, calls back with that transaction:callback(nil, transaction)

The end result? An immutable object. We don’t have to reason about whether or not it is fully initialised, it either exists or it does not. Consider an alternative, wherein the Transaction class tracks internal state:

class Transaction {
  var HTTPRequestInProgress: bool
  var hadError: Bool? = nil
  var description: String? = nil
  var transactionTime: Date? = nil

    description: String,
    transactionTime: Date,
    callback: (Error?, Transaction?) -> Void
  ) {
    HTTPRequestInProgress = true
    dummyHTTP() { data: Data?, error: Error? in 
       /* Look for errors, try decoding, set
          `hadError` as appropriate */
       HTTPRequestInProgress = false
       callback(nil, self)

Now we must reason about all sorts of new possibilities. Are we trying to utilise a Transaction that is not yet ready? Have we guarded against nil when utilising a Transaction that is ostensibly ready?  Down this path lies a jumble of guard statements, if-else clauses, and sad baby birdies.

Don’t make the baby birdies sad, asynchronously initialise immutable objects! 💕

Further Reading

– Hugh

Lessons from releasing a personal project as a commercial product

Aliens. It all begins with aliens. Rewind to San Francisco, and a game developer named Unknown Worlds.  Unknown Worlds is awesome.  We’re chilled out, but we create wonderful products. The games we make bring joy to millions of people around the world. The founders, Charlie and Max, are just the coolest and most inspirational blokes.

Before Unknown Worlds, I was at KPMG. A bean-counter, not a programmer. I couldn’t tell computers what to do. But now, making games, I was surrounded by people who could.

I was so inspired by Brian Cronin, Dushan Leska, Jonas Bötel,  Steve An, and others. They were gods. They would sit in a trance for days, occasionally typing incantations on their keyboards, and eventually show us some amazing new game feature. I was in awe.

Dushan would say to me: ‘Just automate something you do every day. It will be hard, you will have to learn a lot, but it will teach you how to write code‘. So I did.

I hold Dushan (mostly) responsible for this mess

At KPMG I spent a lot of time doing battle with Microsoft Excel.  There is nothing fundamentally wrong with Excel. The problem is that it is an extremely generalised tool, and the work we were doing was not generalised. Too much time was spent copying and pasting data, sanitising data, shuffling data by hand.

When I arrived at Unknown Worlds, I started monitoring our sales. I channeled my inner KPMG and created glorious spreadsheets with pretty graphs. It was an awfully manual process. So, on Dushan’s advice, I started automating it.

The process was agonisingly slow. I would devote time after work, on weekends, at lunches: I had no teacher. Once I got going though, I was hooked. Tasks that used to take us hours at KPMG evaporated in moments in the hands of the machine. I felt like a magician.

With great power comes great responsibility. Soon I was writing code in our games. I thought I was pretty damn clever. Some of the stuff I wrote was super cool, one feature even got me invited to speak at Game Developer’s Conference. But damn, most of it was hot garbage.

Working on Subnautica taught me that mediocre programmers are dangerous to the health of large projects. Also dangerous: Reaper Leviathans.

There is nothing more dangerous on a big software project than a mediocre programmer. We’re like a radioactive prairie dog on heat: Running around contaminating codebases with bugs, indecipherable intent, zero documentation, no testing, and poor security.

Eventually I learned enough to realise I needed to ban myself from our game’s codebases. I was desperate to be better: I wanted to be able to contribute to Unknown Worlds games in a sustainable, positive way. One day I read a recommendation: Create a personal project. A project you can sculpt over a long period of time, learning new skills and best practices as you go.

Channeling Dushan again, I decided to start an accounting software project. Accounting software gives me the shits. As I learned more about code, I realised that most accounting software is shit. And it’s near impossible to integrate the big accounting software packages into other software.

How many software startups can you fit in one frame?

Piece by piece, after hours, over weekends, and at any time a healthier person would take a holiday, I put together a beast I called Amatino. It was always supposed to be something small. A side project that I would use myself. Haha… ha. Oh dear.

Today Amatino is available to anyone. It’s a globally-distributed, high-performance, feature-rich accounting wet dream. You can actually subscribe to Amatino and real money will arrive in my bank account. That’s just fucking outrageous!

Still can’t believe this is a real screenshot

Even better, I’ve achieved my original goal. I feel comfortable digging around in code on Unknown Worlds games, and am no longer a dangerous liability to our code quality. I can finally do some of what I saw Max, Charlie, Dushan, Steve, Jonas and Brian doing all those years ago.

Along the way I picked up a few lessons.

Lesson 1: Do it

Creating your own product is utterly exhilarating and mind expanding. I’m about as artistic as an Ikea bar stool, but I imagine this is how artists feel when they make art. It just feels great.

Lesson 2: Keep your day job

Alright, maybe quit your day job if it doesn’t make you happy. But if you are happy, keep at it. Over the past years I’ve given Unknown Worlds 100% and more. Unknown Worlds makes me super happy. To build Amatino simultaneously, I had to develop discipline: Every night, every weekend, every holiday, code like hell.

Spend enough time around Max (L) and Charlie (R), the founders of Unknown Worlds, and you will be inspired to do cool stuff

There are many benefits. First, you don’t lose contact with your work mates. Charlie, Max, Scott, Brandt, and many others are constant inspirations to me. Second, you don’t have to worry about funding, because you have a job. Third, you are kept grounded.

I think if I didn’t spend all day making games, Amatino would have sent me insane. I would have lacked direction, and woke up not knowing what to do. Instead, I worked on making games, structured my day around Unknown Worlds, and devoted focused, intense energy to Amatino when possible.

Lesson 3: Your partner comes first

No matter how important a milestone is, or how deep in thought you are, or how good you think your ideas are, you drop everything for your partner. You lift up your partner, you encourage your partner, you support your partner. Every day, without fail, without exception.

This was a hard lesson to learn. It is the most important lesson.

Without Jessica, Amatino would not have happened. And it is precisely because she took me away from Amatino that she helped. The ritual of cooking for her, sharing meals with her, going on dates with her, doing household chores with her, listening attentively to her thoughts, concerns, and dreams. All these things take immense time, time you might wish to devote to your project instead.

You must not make that trade. It is a false economy. Your productivity will suffer, your health and emotional wellbeing will suffer. The energy you devote to your partner instead of your project will come back to you tenfold and more.

Don’t bore your partner to death by constantly talking about your project. Most importantly, don’t put off big life decisions because you think the time will be right after your project is released.

Don’t put off the big decisions!

Lesson 4: Eat well, exercise, and don’t get drunk

You all hear this enough elsewhere. You have a day job, a personal project, and perhaps a partner too: You cannot waste time recovering from the ingestion of cognitive impediments.  Any social value you get from being drunk is utterly dwarfed by the opportunity cost of brain-cells not functioning at peak efficiency.

Your mates might give you hell for this. Don’t worry, they will still love you in the long run.

Lesson 5: Ignore the framework brigade

I’m building a Dockerized cloud Node app with React-native frontend on GCP powered by the blockchain.” Don’t be those people. Learn from first principles. Start with abstract design thought, not a list of software for your ‘stack’. Don’t be afraid to build your own systems.

Reach for third-party dependencies judiciously and only where absolutely necessary. Learn by dabbling in languages where you need to allocate your own memory, while leveraging the speed boost that comes with those in which you don’t. Build computers. Tinker with them.

You will learn a lot from building, breaking, and upgrading your own computers. This one was maybe me taking it a bit too far

Hot tip: If your elevator pitch contains the brand name of a third party dependency, you are violating Lesson 5.

Lesson 6: Be humble

Maybe some people get ahead in life by being arrogant, self-assured dickheads. In fact, I am sure that is true. If you want to build and release a product, you need to check your ego at the door.

Suck in information from everyone and everything around you. Approach the world with unabridged, unhinged curiosity. Even when you don’t agree with someone, give them your undivided attention and listen, don’t talk. Consider their advice most especially if it conflicts with your own assumptions.

Good luck!

Principles for safe and clean JavaScript

Perhaps you like writing JavaScript. Perhaps you also like poking your eyes out with sticks. The rest of us like type-safety, a clean object model, and being able to assert against our own stupidity.

Yet no matter how much we loathe it, no one can avoid writing JavaScript. Any half serious product will eventually need a website, and that website is going to need JavaScript. Here are some principles that, when applied, may reduce the probability of insanity.

No vars

Don’t spray your scope everywhere. If you write var, you’re doing something wrong. There is no circumstance under which you should need var over const or let. If you need a variable in a higher scope, then put it there explicitly, don’t imply scope by hammering it with var.

// Bad
function bloop(pigeons) {

   if (pigeons > 42) {
      var output = pigeons + '!';

   // `output` has no business being
   // in scope here. Use let instead!


By restricting scope, you restrict the shit your brain needs to reason about. Your brain is a pile of mush that has trouble thinking about more than six thing at once. Throw it a bone.

And before you say ‘but not all users have ECMA XXXX’, let me stop you. You’re not Google. Settle down. Whatever tiny proportion of your users don’t have access to modern syntax are not material to your business.

In fact, allowing dinosaur devices to use your service is to do them a disservice. They’re a security risk to themselves and to you. Let the ancient devices go. Just let them go.

Prefer const

You’re already going to be suffering enough pain trying to untangle your types. At least you can protect yourself against accidental mutation. const everything unless you are absolutely sure you need to mutate. Then, ask yourself if you can redesign your code to avoid mutation.

const AWESOME_SUFFIX = ' is awesome!';

function preach(truth) {
   const output = truth + AWESOME_SUFFIX;
   // We can't accidentally mutate
   // output.


Enforcing immutability allows you to spend less time reasoning about state, and more time reasoning about the problem you are trying to solve.

Build your own types

The days of something.prototype.what.was.the.syntax are gone. Let them be gone, and lean on an object-oriented approach.

const DESCRIPTION_RAGE = 'induces rage!';
const DESCRIPTION_CALM = 'is quite pleasant.';

class Language {

   constructor(name, induces_rage) {
      this._name = name;
      this._induces_rate = induces_rage;

   description() {
      if (this._induces_rage) {
         return this._name + DESCRIPTION_RAGE;
      return this._name + DESCRIPTION_CALM;


const SWIFT = new Language(
const JAVASCRIPT = new Language(
const PYTHON = new Language(

If you ever find yourself wrangling low-level types outside a method, you’ve probably got yourself a good case for defining a custom type. Modern JavaScript syntax makes it super easy and there is no excuse to avoid it.

Avoid 3rd party abstraction layers

Hey there, welcome to 2018. The days of needing jQuery are long, long gone. Browsers are generally highly standards compliant. Yes, document.getElementById('bleep') will just work.

There is no point in abstracting the core DOM API anymore. You’re not supporting IE6 and if you are, just get out, don’t @ me. The only thing that DOM API abstractions are good for are:

  • Bloating page weight
  • Excessive allocations
  • Making ~2009 school JavaScript devs who never moved with the times feel good about themselves by validating their desire to start every line of code with a dollar sign despite the fact that there are no material benefits and they just end up splintering community knowledge into needless silos while wasting precious cycles

Check yourself while you wreck yourself

It’s not easy to practice safe JavaScript. My go-to condom in Python is assert, and in Swift it’s guard. Meanwhile, running JavaScript is like rolling around naked and blindfolded on a Thai beach during a full-moon party.  You can’t be sure what’s going to happen, but you can bet it’s not all going to be enjoyable.

You can take some action to protect yourself. throw liberally where you make assumptions. That way, your code will at least hard crash in the off-nominal case, rather than merrily trucking on with an undefined just waiting to ruin your day.

const INVALID_INPUT = 'oink oink';

class InputField {
   constructor(element_id) {
      this._element = document.getElementById(
      if (!this._element) { throw 'Abort!' }


   is_valid() {
      const value = this._element.value;
      if (value === INVALID_INPUT) {
         return false;
      return true;



In the above case, the throw catches a case in which we supplied a non-existent DOM element id. Without the throw, execution would proceed and we would have no indication that shit was sideways until we called is_valid().

Lean into the wind

JavaScript is here to stay. Websites are important parts of product development. No matter how much you dislike it, it is important to develop JavaScript skills. Do it with a bit of modern, clean syntax and it can be less painful that you might expect.

Playing doctor with a Linux system

Monitoring Linux system health is a route to peace of mind. When a fleet of machines is serving an application, it is comforting to know that they are each and collectively operating within hardware performance limits.

There are countless libraries, tools, and services available to monitor Linux system health. It is also very easy to acquire system health information directly, allowing construction of a bespoke health monitoring subsystem.

There are five critical metrics of system health:

  1. Available memory
  2. CPU utilisation
  3. Network I/O (data transmission and receipt)
  4. Available disk space
  5. Disk I/O (reads and writes to disk)

Let’s take a look at how we can examine each one. This article is written from the perspective of Ubuntu 16.04, but many of the commands are available across Linux distributions. Some of them only require the Linux kernel itself.

Available Memory

We can get available memory using the free command. On its own, free will give us a bunch of columns describing the state of physical and swap memory.

$ free
       total  used  free  shared  buff/cache  available
Mem:   498208 47676 43408 5568    407124      410968
Swap:       0     0     0

There’s a lot going on here. What we are looking for, in plain English, is ‘how much memory is available to do stuff’. If such a number was low, we would know that the system was in danger of running out of memory.

The number we want is counterintuitively not the one in the column labelled ‘free’. That column tells us how much memory the system is not using for anything at all. Linux uses memory to cache regularly accessed files, and for other purposes that don’t preclude its allocation to a running program.

What we want is column 7, ‘available’. We can get just that number by using grep and awk. We can also use the -m flag to return results in megabytes, rather than bytes, thus making the output more readable.

$ free -m | grep 'Mem:' | awk '{print $7}'

That’s much better! A single integer representing how many megabytes of memory are available for the system to do things.

On its own, this is not very useful. You are not going to go around SSH’ing to every box in your fleet, running commands and noting numbers down on a piece of paper. The magic happens when the output is combined with some program that can collate all the data. For example, in Python, we could use the Subprocess module to run the command then store the number:

import subprocess

command = "free -m | grep 'Mem:' | awk '{print $7}'"
memory_available = int(subprocess.getoutput(command))

CPU Utilisation

To monitor Linux system cpu utilisation, we can use the top command. top produces while bunch of output measuring the cpu utilisation of every process on the system. To get an overall sense of system health, we can zero in on the third line:

$ top
%Cpu(s):  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

Four numbers are of use to us; Those succeeded by us, sy, id, and wa, which indicate the proportion of CPU time allocated to user processes, system processes, idling, and I/O wait respectively.

To acquire these numbers programmatically, we need to adjust top‘s output slightly.  We’ll use a few flags:

  • -b: Run in a non-interactive mode. Top will return to the shell rather than running indefinitely
  • -n: Sample for a specified number of iterations. We will use two iterations, and take numbers from the second.
  • -d: Delay time between iterations. We will supply a non-zero number so that top acquires data over some time

The whole command will be:

$ top -d 3 -b -n 2 | grep "%Cpu"

In Python, we can execute the command and split the output into individual floating point numbers. To do so, we take advantage of the fixed-width position of top’s output.

import subprocess

command = 'top -d 3 -b -n 2 | grep "%Cpu"'
output = subprocess.getoutput(command)
data = output.split('\n')[1]
cpu_user = float(data[8:13])
cpu_system = float(data[17:22])
cpu_idle = float(data[35:40])
cpu_io_wait = float(data[44:49])

Network I/O

All the hardware in the world won’t save you if your network connection can’t keep up. Monitoring transmit and receive volumes is, fortunately, pretty easy. The kernel provides us with a convenient window onto network activity, in /sys/class/net.

$ ls /sys/class/net
eth0 lo tun0

On this example system, /sys/class/net contains three network interfaces. An ethernet adapter eth0, the local loopback lo, and a vpn tunnel adapter tun0.

How you proceed to gather the information available about these interfaces is going to depend heavily on your situation. The following technique satisfies a couple of assumptions:

  1. We don’t know the number or disposition of network interfaces in advance
  2. We want to gather transmit / receive statistics for all interfaces except the local loopback
  3. We know that the local loopback interface name alone will always start with the character l.

These assumptions might not apply to you. Even if they don’t, you might be able to apply some of the techniques used herein to your situation.

Inside each interface, there is a statistics directory containing a wealth of information.

$ ls /sys/class/net/tun0/statistics
collisions        rx_packets
multicast         tx_aborted_errors
rx_bytes          tx_bytes
rx_compressed     tx_carrier_errors
rx_crc_errors     tx_compressed
rx_dropped        tx_dropped
rx_errors         tx_errors
rx_fifo_errors    tx_fifo_errors
rx_frame_errors   tx_heartbeat_errors
rx_length_errors  tx_packets
rx_missed_errors  tx_window_errors

To get a general overview of network activity, we will zero in on rx_bytes and tx_bytes.


$ cat /sys/class/net/tun0/statistics/rx_bytes
$ cat /sys/class/net/tun0/statistics/tx_bytes

These integer counters tick upwards since, effective, system boot. To sample network traffic, you can take readings of the counters at two points in time. The counters will wrap, so if you have a very busy or long-lived system you should account for potential wrapping.

Here is a Python program that samples current network activity in kilobytes per second.

""" - sample snippet"""
root = 'cat /sys/class/net/'
root += interface + '/statistics/'
rx_command = root + 'rx_bytes'
tx_command = root + 'tx_bytes'
start_rx = int(subprocess.getoutput(rx_command))
start_tx = int(subprocess.getoutput(tx_command))
end_rx = int(subprocess.getoutput(rx_command)
end_tx = int(subprocess.getoutput(tx_command))
rx_delta = end_rx - start_rx
tx_delta = end_tx - start_tx
if rx_delta <0:
   rx_delta = 0
if tx_delta <0:
   tx_delta = 0
rx_kbs = int(rx_delta / seconds / 1000)
tx_kbs = int(tx_delta / seconds / 1000)

Note that this program includes a hard coded interface, tun0. To gather all interfaces, you might loop through the output of ls and exclude the loopback interface.  For purposes that will become clearer later on, we will store each interface name as a dictionary key.

""" - interface loop snippet"""
output = subprocess.getoutput('ls /sys/class/net')
all_interfaces = output.split('\n')
data = dict()
for interface in interfaces:
   if interface[0] == 'l':
   data[interface] = None

On a system with multiple interfaces, it would be misleading to measure the traffic across each interface in sequence. Ideally we would sample each interface at the same time. We can do this by sampling each interface in a separate thread. Here is a Python program that ties everything together and does just that. The above two snippets, “sample” and “interface loop”, should be included where annotated.

import subprocess
import time
from multiprocessing.dummy import Pool as ThreadPool


def network(seconds: int) -> {str: (int, int)}:
   Return a dictionary, in which each string
   key is the name of a network interface,
   and in which each value is a tuple of two
   integers, the first being sampled transmitted
   kb/s and the second received kb/s, averaged
   over the supplied number of seconds.

   The local loopback interface is excluded.
   # Include 'interface loop' snippet here
   def sample(interface) -> None:
      # Include 'sample' snippet here
      data[interface] = (tx_kbs, tx_kbs)

   pool = ThreadPool(len(data))
   arguments = [key for key in data]
   _ =, arguments)
   return data

if __name__ == '__main__':
   result = network(DEFAULT_SAMPLE_SECONDS)
   output = 'Interface {iface}: {rx} rx kb/s
   output += ', {tx} tx kb/s'
   for interface in result:

Running the whole thing gives us neat network output for all intefaces:

$ python3
Interface tun0: 10 rx kb/s, 64 tx kb/s
Interface eth0: 54 rx kb/s, 25 tx kb/s

Of course, printing is fairly useless. We can import the module and function elsewhere:

from import network as network_io

sample = network_io(2)
for interface in sample:
   tx = sample[interface][0]
      if tx > TX_DANGER_THRESHOLD:
         # Raise alarm 
# Do other stuff with sample

Disk Space

After all that hullaballoo with network I/O, disk space monitoring is trivial. The df command gives us information about free disk usage:

$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
udev              239812       0    239812   0% /dev
tmpfs              49824    5540     44284  12% /run
/dev/xvda1       8117828 3438396   4244156  45% /
tmpfs             249104       0    249104   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs             249104       0    249104   0% /sys/fs/cgroup
tmpfs              49824       0     49824   0% /run/user/1000

This is a bit of a mess. We want column four, ‘available’, for the partition you wish to monitor, which in this case is /dev/xvda1. The picture will get much messier if you have more than one partition on the system. In the case of a system with one partition, you will likely find it mounted at /dev/somediskname1. Common disk names include:

  • sd: SATA and virtualised SATA disks
  • xvd: Xen virtual disks. You will see this if you are on EC2 or other Xen based hypervisors
  • hd: IDE and virtualised IDE disks

The final letter will increment upwards with each successive disk. For example, a machine’e second SATA disk would be sdb. An integer partition number is appended to the disk name. For example, the third partition on a machine’s third Xen virtual disk would be xvdc3.

You will have to think about how best to deal with getting the data out of df. In my case, I know that all machines on my network are Xen guests with a single partition,  so I can safely assume that /dev/xvda1 will be the partition to examine on all of them. A command to get the available megabytes of disk space on those machines is:

$ df -m | grep "^/" | awk '{print $4}'

The grep phrase "^/" will grab every line beginning with "/". On a machine with a single partition, this will give you that partition, whether the disk is sd, xvd, hd, and so on.

Programmatically acquiring the available space is then trivial. For example, in Python:

import subprocess

command = 'df -m | grep "^/" | awk \'{print $4}\''
free = int(subprocess.getoutput(command))

Disk I/O

A system thrashing its disks is a system yielding unhappy users. /proc/diskstats contains data that allow us to monitor disk I/O. Like df, /proc/diskstats output is a messy pile of numbers.

$ cat /proc/diskstats
202       1 xvda1 2040617 57 50189642 1701120 3799712 2328944 85759400 1637952 0 1064928 3338520

Column 6 is the number of sectors read, and column 10  is the number of sectors written since, effectively, boot. On a long lived or shockingly busy system these numbers could wrap. To measure I/O per second, we can sample these numbers over a period of time.

Like with disk space monitoring, you will need to consider disk names and partition numbers. Because I know this system will only ever have a single xvd disk with a single partition, I can safely hardcode xvda1 as a grep target:

$ cat /proc/diskstats | grep "xvda1" | awk '{print $6, $10}'
50192074 85761968
 Once we have the number of sectors read and written, we can multiply by the sector size to get I/O in bytes per second. To get sector size, we can use the fdisk command, which will require root privileges.
$ sudo disk -l | grep "Sector size" | awk '{print $4}'

On a machine with more than one disk, you will need to think about getting sector sizes for each disk.

Here’s a Python program that ties all that together:

import subprocess
import time

seconds = 2

command = 'sudo fdisk -l | grep'
command += '"Sector size" | awk \'{print $4}\''
sector_size = int(subprocess.getoutput(command))
command = 'cat /proc/diskstats | grep "xvda1"'
command += ' | awk \'{{print $6, $10}}\''

sample = subprocess.getoutput(command)
start_read = int(sample.split(' ')[0])
start_write int(sample.split(' ')[1])


sample = subprocess.getoutput(command)
end_read = int(sample.split(' ')[0])
end_write = int(sample.split(' ')[1])

delta_read = end_read - start_read * sector_size
delta_write = end_write - start_write * sector_size
read_kb_s = int(delta_read / seconds / 1000)
write_kb_s = int(delta_write / seconds / 1000)

A Bespoke Suit

Now that we’ve collected all these data, we can decide what to do with them. I like to gather up all the data into a json package and shoot them off to a telemetry aggregating machine elsewhere on the network. From there it is a hop, skip and a jump to pretty graphs and fun SQL queries.

By gathering the data yourself, you have the freedom to store, organise, and present the data as you see fit. Sometimes, it is most appropriate to reach for a third party tool. In others, a bespoke solution gives unique and powerful insight.

Automating Application Installation on Linux with Python

Perhaps you have a shiny new web application. It responds to HTTPS requests, delivering pure awesome in response. You would like to install the application on a Linux server, perhaps in Amazon EC2.

Performing the installation by hand is a Bad Idea™. Manual installation means you cannot easily scale across multiple machines, you cannot recover from failure, and you cannot iterate on the machine configuration.

You can automate the installation process with Python. The following are examples of procedures that introduce principles for automation. This is not a step-by-step guide for an entire deployment, but it will give you the tools you need to build your own.

Connecting via SSH with Paramiko

A manual installation process might involve executing lots of commands inside an SSH session. For example:

$ sudo apt update
$ sudo apt install nginx

All of your hard-won SSH skills can be transferred to a Python automation. The Paramiko library offers SSH interaction inside Python programs. I like to shorthand my use of Paramiko by wrapping it in a little container:


from paramiko import SSHClient
from paramiko import SFTPClient
from paramiko import AutoAddPolicy

class SSHSession:
    """Abstraction of a Paramiko SSH session"""
    def __init__(
        hostname: str,
        keyfile: str,
        username: str

        self._ssh = SSHClient()


    def execute(self, command: str) -> str:
        """Return the stdout of an SSH command"""
        _, stdout, _ = self._ssh.exec_command(

    def open_sftp(self) -> SFTPClient:
        """Return an SFTP client"""
        return self.ssh.open_sftp()

We use paramiko.AutoAddPolicy to automatically add the server to our known_hosts file. This effectively answers ‘yes’ to the prompt you would see if initiating a first time connection in an interactive terminal:

The authenticity of host '
(' can't be established. ECDSA key
fingerprint is 
Are you sure you want to continue connecting

You should only do this if you have otherwise secured the network path to your server. If you have not, connect manually first via a terminal and check the key fingerprint.

We initialise an SSHSession instance with a set of parameters that conveniently match what you might already have in your SSH config file. For example:

$ cat ~/.ssh/config
Host some_friendly_server_name
    User hugh
    IdentityFile ~/.ssh/some_private_key

The matching Paramiko session would be:

from ssh_session import SSHSession

SSH = SSHSession(

We now have a convenient little object that can run SSH commands for us.  Note that the object ignores errors in stderror returned by paramiko.SSHClient.exec_command(). While this is convenient when we are confident of our commands, it makes debugging difficult. I recommend debugging in an interactive SSH session rather than in Python.

Installing Dependencies

Let’s start by installing Nginx and Git. You could substitute these with any dependency of your application.

_ = SSH.execute('sudo apt update')
_ = SSH.execute('sudo apt install nginx -y')
_ = SSH.execute('sudo apt install git -y')

Note the ‘-y’ at the end of the apt install command. Without it, the session will hang at the apt continuance prompt:

After this operation, 4,816 kB of additional disk
space will be used.

Do you want to continue? [Y/n]

The requirement to bypass interactive prompts will be a common thread throughout this article. When automating your process, step through it manually and take careful note of where interactive prompts are required.

Creating a Linux User

Our application should, of course, run under its own user. Let’s automate that process:

APP_USER = 'farquad'
command = 'sudo adduser --system --group '
command += APP_USER
_ = SSH.execute(command)

Note that we establish the username as a constant, and don’t hardcode it into our command. This external definition, whether through a constant or a function parameter or however else it is done, is important for several reasons.

  1. It allows you to re-use the command with multiple parameters. For example, perhaps your application requires multiple users.
  2. It implements Don’t-Repeat-Yourself ‘DRY’ principle. We will likely need the username elsewhere, and by externally defining it we have created a single source of authority.

Automating File Transfer using Paramiko SFTP

Suppose your application is stored in a Git repository, like Bitbucket or Github, and that the repository is private. It is no use having an automated installation process if you need to answer an HTTPS password prompt when pulling a repository.

Instead, let’s automate the process by using SSH and installing a repository key on the machine. First, the transfer process:

KEY_FILEPATH = '~/some/key'
with open(KEY_FILEPATH, 'r') as keyfile:
    key =

sftp = SSH.open_sftp()
remote_file = sftp.file('~/repository_key', 'w')

Note that we first SFTP’d the key into our privileged user’s home directory, rather than directly into the application user’s directory. This is because our privileged user does not have permission to write into the application users’ home directory without sudo elevation, which we can’t do in the SFTP session.

Let’s move it into the appropriate place, and modify permissions appropriately:

command = 'sudo mkdir /home/' + APP_USER + '/.ssh'
_ = SSH.execute(command)

command = 'sudo mv ~/repository_key'
command += ' /home/' + APP_USER + '/.ssh/
_ = SSH.execute(command)

command = 'sudo chmod 600 /home/' + APP_USER
command += '/.ssh/repository_key'
_ = SSH.execute(command)

The file is now in the appropriate location, with the appropriate permissions. We repeat the process to install an ssh configuration file. I won’t lay out the entire process, but the principle is the same: Open an SFTP session, plop the file on the server, and move it  and re-permission around as necessary.

There is one important consideration. Because we have been creating directories as our privileged user, we need to turn over those directories to the application user:

command = 'sudo chown -R '
command += APP_USER + ':' + APP_USER
command += ' /home/' + APP_USER + '/.ssh'
_ = SSH.execute(command)

In the end, there should be an SSH configuration file on the server owned by the application user. Here is an example, using all the same names we have been using so far:

$ cat /home/farquad/.ssh/config
    User git
    IdentityFile ~/.ssh/repository_key
$ ls -la /home/farquad
drwxrwxr-x 2 farquad farquad 4096 Mar 23 18:47 .ssh

Pulling the Repository

The next step is easy mode. You’ve set things up such that your wonderful application can be pulled down in a single command:

command = 'cd /home/' + APP_USER
command += '; sudo -u ' + APP_USER
command += ' git clone ' + REPOSITORY
_ = SSH.execute(command)

Well, maybe almost easy mode. There’s a bit going on here. Note the separation of commands via a semicolon. Consider your Python SSH connection to be a very ‘loose’ one. It won’t retain environment information, including current directory, between executions. Therefore, to use conveniences like cd, we chain commands with semicolons.

Also note the sudo -u farquad. We do this so that the git repository is pulled down as the property of our application user,  not our privileged user. This saves us all the dicking about with permissions that plagued the SFTP steps above.

Paramiko and Virtual Environments, like Virtualenv

The ‘loose’ nature of the Paramiko session referenced above becomes particularly important when working with virtual environments. Consider the following:

$ virtualenv -p python3 /home/farquad/app
$ cd /home/farquad/app
$ source bin/activate
(app) $ pip install gunicorn

If executed as distinct commands via a Paramiko SSH session, the gunicorn library will end up installed via systemwide pip. If you then attempt to run the application inside the virtual environment, say inside an Systemd configuration file…


… Then your application will fail because gunicorn was missing from the virtual environment. Instead, be sure to execute commands that require a particular environment in an atomic manner.

Your Move!

Once your application deployment is automated,  you have freed yourself from having to trudge through SSH command sequences every time you want to adjust your deployment. The fear of breaking a server disappears, because you can fire up a replacement at will. Enjoy the freedom!

Bigint vs Numeric Performance in PostgreSQL 9.6

PostgreSQL 9.6

Suppose we have an application that stores two big lists of numbers in a relational database. These numbers represent money – Both the amount of money units in transactions, and the price of that money.

In response to a client query, we must multiply the big list of transaction values by the big list of prices, and sum a running total of those multiples. We assume we can’t pre-calculate and store the running values, and that we must supply the client with strings, perhaps because they are communicating with us via a web API. Suppose clients expect responses in real time.

Storing money requires absolute precision. We can’t use floats, for obvious reasons. So we would probably use an arbitrary precision type. If we were using PostgreSQL 9.6, as this article assumes, then we might choose to use a numerictype to store both values and prices.

Numerics give us arbitrary precision, but come with some baggage. Numerics are binary coded decimals, and CPUs can’t interact with them directly using hardware instructions. Whereas two 64-bit integers will fit in a CPU register and be happily summed, two numerics need to go through a bunch of software shuffling to perform a calculation.

Therefore, if we need to perform lots of calculations quickly, numerics might slow us down when compared to using a 64-bit integer. In PostgreSQL, a 64-bit integer is called a bigint.

Hypothesis: Performing all calculations on integers of type bigint, rather than binary coded decimals of type numeric, will vastly reduce computation times when responding to our client’s queries. Let’s assume the following:

  • A price is stored as a bigintwith an implicit precision of four decimal places. For example, a price of £4.4352 is stored as 44352 (NB the absence of the decimal point).
  • An amount is stored as abigint with an implicit precision of two decimal places. This suits currencies with two decimal places of minor unit precision, such as the US Dollar (USD). For example, an amount of $12,045.42 is stored as 1204542.

The value of any transaction is therefore the price of the money (44352) times the amount of money units (1204542) divided by the extra orders of magnitude introduced by the implicit decimal places.

In our example, the price has introduced four decimal places, and the amount has introduced two decimal places, for a total of six. Therefore, the calculation is as follows:

44352 * 1204542 / 10^6

This yields 53423.846784, which may be nicely formatted for client consumption as $53,423.85.

So, to get to our true value with explicit decimal places, we need to perform division. In PostgreSQL we can’t divide two values of type bigint or we will lose precision. For example,  select (10::bigint / 3::bigint) yields 3. Because we are dealing with money, that is not acceptable.

Numerics, as arbitrary precision types, can retain precision on division. We will cast to anumeric, and divide by anothernumeric of appropriate size for our implied decimal places.

Here’s the example above written in Postgres 9.6 SQL:

    to_char(value::numeric(15,4) / 1000000::numeric(15,4), 'FM999,999,999,999.99') AS str_representation
        amount * price AS interim_value
    FROM (
            1204542::bigint AS amount,
            44352::bigint AS price
    ) AS random_data
) AS calculation

Is this really faster than using binary coded decimals? To answer that question, we’ll create a big set of random data that we can operate on.

We will assume a largish organisation that processes 250 transactions per day every day for 10 years, or 912,500 transactions, and round it up to 1,000,000 for luck. We will assume these transaction amounts are in USD, with two decimal places of implied precision, and are sized somewhere between $0 and $200. Each amount is processed at a price to convert the USD to some Pounds Sterling (GBP). Let’s assume (somewhat absurdly) that each GBP price is somewhere between £0 and £10 with four decimal places of implied precision.

    generate_series (1, 1000000) as row_id,
    (random() * 20000)::bigint AS amount,
    (random() * 100000)::bigint AS price

My machine features an i7-5557U CPU, and it takes ~670ms to generate these data, which look like this:

row_id   | amount | price  
       1 |  14179 |  48998
       2 |  16948 |  56369
       3 |  12760 |  16965
       4 |  17177 |    977
       5 |  11632 |  38872
       6 |  18370 |  44416
       7 |  14370 |  89625

Now, let’s perform the necessary calculations. For each of the million rows, we need to multiply the amount by the price, and then sum those multiples into a running balance column.

    SUM(value) OVER (ORDER BY row_id) AS balance
        row_id, amount * price AS value
    FROM (
            generate_series (1,1000000) as row_id,
            (random()*20000)::bigint AS amount, 
            (random()*10000)::bigint AS price
    ) AS random_data
) AS raw_values

This takes ~1,550ms on my i7-5557U. Less the ~670ms to generate the data yields a calculation time of ~880ms. Computers are so cool. That’s just 880 nanoseconds per row! Here’s some sample output:

   value   |    balance     
  15987240 |       15987240
   5282332 |       21269572
  32625180 |       53894752
 125860125 |      179754877
   3753301 |      183508178
  79953750 |      263461928
  74762297 |      338224225

These huge integers, with their implicit decimal places, are not very useful. They are certainly not human readable. We need to convert them into decimal numbers with explicit decimal places. To do so, we will cast them to type numeric. Finally, recall that our clients require string output, so we will convert the numericinto a nicely formatted stringusing the to_char()function.

    to_char(value::numeric / 1000000::numeric, 'FM999,999,999,990.90') as str_value,
    to_char(balance::numeric / 1000000::numeric, 'FM999,999,999,990.90') as str_balance
        SUM(value) OVER (ORDER BY row_id) AS balance
    FROM (
            row_id, amount * price AS value
        FROM (
                generate_series (1, 1000000) as row_id,
                (random() * 20000)::bigint AS amount, 
                (random() * 100000)::bigint AS price
        ) AS random_data
    ) AS raw_values
) AS balances

Bam! Approximately 2,700ms, Less generation (~670ms) and calculation (~880ms), that’s ~1,150ms. All that bigint::numerictype casting andnumeric / numericdivision adds a good chunk of change. Here is some sample output:

 string_value | string_balance 
 1,029.03     | 1,029.03
 85.97        | 1,115.00
 291.78       | 1,406.78
 670.64       | 2,077.42
 62.33        | 2,139.75
 142.92       | 2,282.68
 2.41         | 2,285.09
 31.01        | 2,316.10

Alright, let’s try the same thing using numeric the whole way through. Then we will compare the difference. First, we’ll generate the random data. We’ll use a numeric with 19 digits of precision, to match the 19 positive number digits we get from a PostgreSQL bigint.

    generate_series (1, 1000000) as row_id,
    (random() * 200)::numeric(17,2) AS amount, 
    (random() * 1)::numeric(15,4) AS price

Generating the random data takes ~2,200ms, and looks as follows:

 row_id  | amount | price  
       1 |  84.33 | 0.8137
       2 | 178.67 | 0.0793
       3 | 115.89 | 0.1935
       4 |   5.61 | 0.9763
       5 |  32.45 | 0.8218
       6 | 154.49 | 0.4451
       7 | 127.43 | 0.3493

Now the basic balance calculation, wherein we multiply amount by price, and then sum a running total of those multiples:

    SUM(value) OVER (ORDER BY row_id) AS balance
        row_id, amount * price AS value
    FROM (
            generate_series (1, 1000000) as row_id,
            (random() * 200)::numeric(17,2) AS amount, 
            (random() * 1)::numeric(15,4) AS price
    ) AS random_data
) AS raw_values
That takes ~3,550ms. Less generation time (2,200ms) yields a calculation time of ~1,350ms. The output looks like this:
   value    |     balance     
  32.893335 |       32.893335
   9.424890 |       42.318225
  46.160686 |       88.478911
 134.282533 |      222.761444
  19.844650 |      242.606094
   8.255778 |      250.861872
   2.859764 |      253.721636

Finally, the complete query. This time, we already have a list of values of type numeric, so we don’t need to peform the initial typecasting step we did in the bigintversion. Instead, we just use to_char()to create pretty strings.

    to_char(value, 'FM999,999,999,990.90') as str_value,
    to_char(balance, 'FM999,999,999,990.90') as str_balance
        SUM(value) OVER (ORDER BY row_id) AS balance
    FROM (
            row_id, amount * price AS value
        FROM (
                generate_series (1, 1000000) as row_id,
                (random() * 200)::numeric(17,2) AS amount, 
                (random() * 1)::numeric(15,4) AS price
        ) AS random_data
    ) AS raw_values
) AS balances

The full query, in numeric form, takes ~4,200ms. Less generation (2,200ms) and calculation (~1,350ms) time, the string processing takes ~650ms.

As expected, the nativelynumeric query is much faster at generating strings: ~650ms vs ~1,150ms, or 56% of the nativebigint query time.

However, the bigint query wins in the calculation stakes: ~880ms vs ~1,350, or 65% of thenumeric time. That means that excluding the random data generation time, the two queries run pretty much equal: 2,030ms for thebigint, and 2,000ms for thenumeric.

Of course, we must consider the elephant in the room: Maximum precision. Usingthebigint technique, we lose a huge chunk of precision every time we perform a calculation. With an amount featuring two implied decimals, and a price featuring four, we lose six digits.

A PostgreSQLbigint can store a maximum positive value of 9,223,372,036,854,775,807,or several quintillion units. Less the digits we used for implied decimals, that falls to two trillion units.

That puts thebigint calculation in a bind. Say we have a query that requires a further calculation step: Perhaps price is calculated as the multiple of two numbers with four decimal digits, e.g. 2.4561 * 1.3258. That is a common case for currencies, and would caused our maximum balance value to fall to around eight hundred million.

Based on the above analysis, we might reasonably assume that such a calculation heavy query denominated inbigint might pull away from anumeric version. However, the loss of maximum balance value might preclude its use for clients with big balance numbers.

As is the way of these things, I’ve now got more questions than I did when I started. What I want to know now is:

  • Does changing thenumeric precision definition, e.g. numeric(9,4)instead of numeric(15,4)affect PostgreSQL calculation performance?
  • We ignored retrieval speed in this analysis. When the data is coming off the disk, donumericandbigint cause materially different read times?
  • Does adding an extra calculation step cause thebigint method to pull away fromnumeric?
  • Couldbigint andnumeric be substituted depending on client needs, allowing small clients to benefit from speed while big ones benefit from arbitrary precision?

Questions for another day.

Chat to me about this article anytime on Twitter.