Pacioli’s Equality, a better name for double entry accounting

Euclid, Pythagoras, Newton, Einstein, Heisenberg, Planck. These names conjure a sense of elegance, of fundamental knowledge, of natural order. They have each contributed foundation stones to our understanding of the natural world. Together, theirs and many other ideas act as first principles upon which we lean as we build spacecraft, cure disease, and create art.

The association of a name with a theory or theorem is not just administrative. We do not use the name ‘Newton’s Third Law’ as a database reference, an abstracted name with which to retrieve associated knowledge inside our brains. Instead, the name ‘Newton’ carries emotion. An emotion that is collectively understood across society.

Newton’s name lends an energy above and beyond the phrase ‘the third law of motion in classical mechanics’. That energy matters: It excites the mind, invites inquiry, provokes the imagination. Present a child with the phrase: ‘A theory on the fundamental limit of precision in measurement of pairs of properties of a physical particle’, and they may well give up on studying physics.

Present them with: ‘Heisenberg’s Uncertainty Principle’, and you might well spark their curiosity. This phenomena is visible beyond ideas describing the natural world: We don’t name brands ‘Very Fast And Good Looking Cars’, we name them ‘Tesla’.

Absolutely zero curiosity, energy, or wonder is sparked by the phrase ‘Double Entry Accounting.’ It is more likely to spark the gag reflex.

This is a crying shame. Such a shame, and such a waste, that I won’t refer to the above mentioned idea in those terms again. Instead, I will refer to it as ‘Pacioli’s Equality’.

This name is not quite fair. Luca Pacioli, an Italian of 15th century vintage, did not invent the accounting principles to which I am lending his name. Instead, his is the first known work to codify them: Summa de arithmetica  geometria, Proportioni et proportionalita, published in Venice in 1494.

Many artefacts predate Summa. Records kept in France in the 13th century, authored by Amatino Manucci, are the earliest surviving example of Pacioli’s Equality. But we must pick a name, and the author of the first known textbook on the topic is as good as any. Pacioli it is.

In the most general terms, Pacioli’s Equality could be described as: ‘Every change has an equal and observable origin.’ Which sounds idiotically simplistic. No less simplistic, I say, than ‘an object that is at rest will stay at rest unless a force acts upon it.’ These phrases are deceptively simple, for they act as foundations upon which immensely valuable paradigms can be constructed.

In the world of money, where Pacioli’s Equality is most often applied, but which is certainly not the only domain in which it can be applied, we could re-word the description as: ‘Everything you have is equal to everything you have received.’ Or, as a very formal equation:

Everything I Have = Stuff People Gave Me

Of course, we might also have some stuff that might not belong to us. Say, ‘everything you have is equal to everything you have received, plus what you have borrowed.’ Formally:

Everything I Have = Stuff I Borrowed + Stuff People Gave Me

Any increase on the left side (what I have), must be balanced by an increase on the right side (I must have either borrowed it, or someone must have given it to me). That blindingly, but deceptively, obvious equality is presented to first-year accounting students as the ‘double-entry accounting equation’:

Assets = Liabilities + Equity

At which point their eyes glaze over, Instagram is opened, and somewhere, an adorable kitten dies.

This is where the shame lies. Pacioli’s Equality is the basis for arbitrarily complex information storage systems. By recording both the origin and destination of a resource, we can construct, for any point in time, both the current state of an entity and every quantum of change that led to that state.

In other words, Pacioli’s Equality allows us to observe both the position and performance of an entity, measured in some arbitrary unit. That unit is most often money: The common names for a measurement of position is a ‘Balance Sheet.’ The common name for a measure of performance is an ‘Income Statement’.

The fundamental elegance of Pacioli’s Equality is utterly absent from modern accounting practice. Load any piece of accounting software, and you will be presented with ‘invoices, customers, credit cards, bank accounts, trial balances’: These are domain-specific objects, each of which is an implementation of the equality, rather than a window onto it.

Sometimes, software might open an interface to ‘journals’, or allow direct manipulation of ‘ledgers’. These are edging closer to a fundamental expression of Pacioli’s Equality, but they are treated as second class citizens. Interacting with them, especially programmatically, is generally painful.

We have combined computers and fundamental knowledge to create wonderful outcomes. Program Einstein’s theories into a computer system and you can model the position of a space probe orbiting Saturn to pinpoint accuracy. Build on Euclid’s theorem and a computer can create nigh-unbreakable cryptographic constructs that allow distributed virtual currencies.

Where is the fundamental computerised expression of Pacioli’s Equality? It is surely not manifest in the current accounting software landscape. That is a shame. We are poorer for it. We can make a baby step towards encouraging innovation by replacing the awful name. Exit double-entry accounting. Enter Pacioli’s Equality.

Automating Application Installation on Linux with Python

Perhaps you have a shiny new web application. It responds to HTTPS requests, delivering pure awesome in response. You would like to install the application on a Linux server, perhaps in Amazon EC2.

Performing the installation by hand is a Bad Idea™. Manual installation means you cannot easily scale across multiple machines, you cannot recover from failure, and you cannot iterate on the machine configuration.

You can automate the installation process with Python. The following are examples of procedures that introduce principles for automation. This is not a step-by-step guide for an entire deployment, but it will give you the tools you need to build your own.

Connecting via SSH with Paramiko

A manual installation process might involve executing lots of commands inside an SSH session. For example:

$ sudo apt update
$ sudo apt install nginx

All of your hard-won SSH skills can be transferred to a Python automation. The Paramiko library offers SSH interaction inside Python programs. I like to shorthand my use of Paramiko by wrapping it in a little container:


from paramiko import SSHClient
from paramiko import SFTPClient
from paramiko import AutoAddPolicy

class SSHSession:
    """Abstraction of a Paramiko SSH session"""
    def __init__(
        hostname: str,
        keyfile: str,
        username: str

        self._ssh = SSHClient()


    def execute(self, command: str) -> str:
        """Return the stdout of an SSH command"""
        _, stdout, _ = self._ssh.exec_command(

    def open_sftp(self) -> SFTPClient:
        """Return an SFTP client"""
        return self.ssh.open_sftp()

We use paramiko.AutoAddPolicy to automatically add the server to our known_hosts file. This effectively answers ‘yes’ to the prompt you would see if initiating a first time connection in an interactive terminal:

The authenticity of host '
(' can't be established. ECDSA key
fingerprint is 
Are you sure you want to continue connecting

You should only do this if you have otherwise secured the network path to your server. If you have not, connect manually first via a terminal and check the key fingerprint.

We initialise an SSHSession instance with a set of parameters that conveniently match what you might already have in your SSH config file. For example:

$ cat ~/.ssh/config
Host some_friendly_server_name
    User hugh
    IdentityFile ~/.ssh/some_private_key

The matching Paramiko session would be:

from ssh_session import SSHSession

SSH = SSHSession(

We now have a convenient little object that can run SSH commands for us.  Note that the object ignores errors in stderror returned by paramiko.SSHClient.exec_command(). While this is convenient when we are confident of our commands, it makes debugging difficult. I recommend debugging in an interactive SSH session rather than in Python.

Installing Dependencies

Let’s start by installing Nginx and Git. You could substitute these with any dependency of your application.

_ = SSH.execute('sudo apt update')
_ = SSH.execute('sudo apt install nginx -y')
_ = SSH.execute('sudo apt install git -y')

Note the ‘-y’ at the end of the apt install command. Without it, the session will hang at the apt continuance prompt:

After this operation, 4,816 kB of additional disk
space will be used.

Do you want to continue? [Y/n]

The requirement to bypass interactive prompts will be a common thread throughout this article. When automating your process, step through it manually and take careful note of where interactive prompts are required.

Creating a Linux User

Our application should, of course, run under its own user. Let’s automate that process:

APP_USER = 'farquad'
command = 'sudo adduser --system --group '
command += APP_USER
_ = SSH.execute(command)

Note that we establish the username as a constant, and don’t hardcode it into our command. This external definition, whether through a constant or a function parameter or however else it is done, is important for several reasons.

  1. It allows you to re-use the command with multiple parameters. For example, perhaps your application requires multiple users.
  2. It implements Don’t-Repeat-Yourself ‘DRY’ principle. We will likely need the username elsewhere, and by externally defining it we have created a single source of authority.

Automating File Transfer using Paramiko SFTP

Suppose your application is stored in a Git repository, like Bitbucket or Github, and that the repository is private. It is no use having an automated installation process if you need to answer an HTTPS password prompt when pulling a repository.

Instead, let’s automate the process by using SSH and installing a repository key on the machine. First, the transfer process:

KEY_FILEPATH = '~/some/key'
with open(KEY_FILEPATH, 'r') as keyfile:
    key =

sftp = SSH.open_sftp()
remote_file = sftp.file('~/repository_key', 'w')

Note that we first SFTP’d the key into our privileged user’s home directory, rather than directly into the application user’s directory. This is because our privileged user does not have permission to write into the application users’ home directory without sudo elevation, which we can’t do in the SFTP session.

Let’s move it into the appropriate place, and modify permissions appropriately:

command = 'sudo mkdir /home/' + APP_USER + '/.ssh'
_ = SSH.execute(command)

command = 'sudo mv ~/repository_key'
command += ' /home/' + APP_USER + '/.ssh/
_ = SSH.execute(command)

command = 'sudo chmod 600 /home/' + APP_USER
command += '/.ssh/repository_key'
_ = SSH.execute(command)

The file is now in the appropriate location, with the appropriate permissions. We repeat the process to install an ssh configuration file. I won’t lay out the entire process, but the principle is the same: Open an SFTP session, plop the file on the server, and move it  and re-permission around as necessary.

There is one important consideration. Because we have been creating directories as our privileged user, we need to turn over those directories to the application user:

command = 'sudo chown -R '
command += APP_USER + ':' + APP_USER
command += ' /home/' + APP_USER + '/.ssh'
_ = SSH.execute(command)

In the end, there should be an SSH configuration file on the server owned by the application user. Here is an example, using all the same names we have been using so far:

$ cat /home/farquad/.ssh/config
    User git
    IdentityFile ~/.ssh/repository_key
$ ls -la /home/farquad
drwxrwxr-x 2 farquad farquad 4096 Mar 23 18:47 .ssh

Pulling the Repository

The next step is easy mode. You’ve set things up such that your wonderful application can be pulled down in a single command:

command = 'cd /home/' + APP_USER
command += '; sudo -u ' + APP_USER
command += ' git clone ' + REPOSITORY
_ = SSH.execute(command)

Well, maybe almost easy mode. There’s a bit going on here. Note the separation of commands via a semicolon. Consider your Python SSH connection to be a very ‘loose’ one. It won’t retain environment information, including current directory, between executions. Therefore, to use conveniences like cd, we chain commands with semicolons.

Also note the sudo -u farquad. We do this so that the git repository is pulled down as the property of our application user,  not our privileged user. This saves us all the dicking about with permissions that plagued the SFTP steps above.

Paramiko and Virtual Environments, like Virtualenv

The ‘loose’ nature of the Paramiko session referenced above becomes particularly important when working with virtual environments. Consider the following:

$ virtualenv -p python3 /home/farquad/app
$ cd /home/farquad/app
$ source bin/activate
(app) $ pip install gunicorn

If executed as distinct commands via a Paramiko SSH session, the gunicorn library will end up installed via systemwide pip. If you then attempt to run the application inside the virtual environment, say inside an Systemd configuration file…


… Then your application will fail because gunicorn was missing from the virtual environment. Instead, be sure to execute commands that require a particular environment in an atomic manner.

Your Move!

Once your application deployment is automated,  you have freed yourself from having to trudge through SSH command sequences every time you want to adjust your deployment. The fear of breaking a server disappears, because you can fire up a replacement at will. Enjoy the freedom!

Game Developer Unions are a Daft Idea

Some game developers would like to unionise. This is not an inherently bad idea. Unionisation is an effective way for people to improve their working conditions when there is a chronic imbalance in bargaining power between workers and management across an industry.

Such an imbalance might occur because regulation makes it hard to start or destroy companies. Or because workers cannot easily move between industries, perhaps because re-training is hard, or because a social security system ties benefits to an individual career . Or for many other real world reasons that affect many people.

Game development does not suffer from such an imbalance. Quite the opposite:

  • Companies making games generally struggle to find and retain skilled workers
  • Strong competition between companies makes capable development teams their only competitive advantage

For workers to enjoy the best working conditions, poorly performing companies must be destroyed as quickly as possible. Yes, that includes studios that we might fondly remember for being very good in the past, but are now falling behind more innovative competitors.

Fortunately, it is very easy to start and destroy game development studios. Capital costs are low, regulation is light, markets are near fully globalised, and geography is largely irrelevant. Under such circumstances, it is relatively easy for a hungry entrepreneur to pull together a motivated team and beat established players.

The best thing that game developers can do is to maintain an atmosphere of ruthless innovation: Bad companies get destroyed, good ones keep popping up. That way, talented game developers can choose from a wide array of companies, allowing demand for their talent to force competition for the acquisition of their labour.

Of course, there is an elephant in the room. If competition in games is so intense, why is pay generally low? Game development attracts lots of people who perceive it as more enjoyable work than say, finance or accounting. At the macroeconomic level, the game development labour market is heavily supplied.

If you are working in game development, someone with equal or lesser talents than you is working in a fin-tech startup earning twice as much as you while working half the hours. If you don’t like that, you need to go work in a fin-tech.

If you try to force higher pay by controlling supply of labour through a union, then the company you work for is going to go bankrupt. Someone hungrier than you is going to supply their labour elsewhere, the company they work for is going to produce an equal product at lower cost,  and customers are going to end your fantasy with their wallets.

Architecting a WiFi Hotspot on a Remote Island

Internet access on Lord Howe Island is very limited. The island is extremely remote. I am intensely interested in providing affordable, accessible, and reliable internet connections to residents and guests.

The ‘Thornleigh Farm’ Internet (internally code-named ‘Nike’) is a newly launched service that offers public internet access on Lord Howe Island. Here are some of the architectural choices I made in developing the service.

Island Cloud

WiFi hotspot solutions require a server that acts to authenticate, authorise, and account for network access. The current industry trend appears to be toward doing this in the ‘cloud’ – I.e. remote data-centres.

Such a solution is not suitable for Lord Howe Island, because of satellite latency. Signals travel well over 70,000 kilometres through space between transmitting and receiving stations, yielding a practical minimum latency of around 600ms, often higher. This high latency creates a crappy customer experience during sign-on.

Instead, Nike utilises local servers for network control. Power is very expensive on Lord Howe Island, which led to a choice of low voltage Intel CPU’s for processing. Two dual-core Intel ‘NUC’ machines serve as hypervisors for an array of network control virtual machines.

Intel NUC machines and Ubiquiti switching equipment

Going local means replicating infrastructure we take for granted in the cloud. Nike utilises local DNS (Bind9), database (Postgres), cache (Redis), and web (NGINX) servers. It’s like stepping back in time, and really makes you appreciate Amazon Web Services (AWS)!

DNS Spaghetti

Bringing the Nike application “island-side” meant dealing extensively with the Domain Name System (DNS). Local requests to the application domain,, need to be routed locally or via satellite depending on their purpose.

For example, new clients are served the Nike purchase page from a local server. Clients of the Thornleigh Farm Store, which offers food orders, are served from an AWS server via satellite.

A local Bind9 DNS captures all domain traffic on our network, and punts it to the local Nginx server. Nginx then chooses to proxy the request to local applications, or to the external AWS Route 53 DNS, depending on the request path.

An island-side client receiving content served from island servers

This request spaghetti has some cool effects: Clients requesting receive an information page when off the island, and a purchase page when they are on it.

Client Identification

From the outset, I wanted to avoid requiring user accounts. Older customers in particular react very poorly to needing to create a new user account, set a password, remember it, and so on.

Also, I am a privacy psychopath and I want to collect the absolute bare minimum customer data necessary to provide the service.

Instead, Nike identifies clients by device Media Access Control (MAC) address. This is uniquely possible on the Thornleigh network because all public clients are on the same subnet. The Nike application can get the MAC associated with a particular IP address in real-time by making a request to the network router.

Part of the Nike codebase that identifies clients by MAC

A small custom HTTP API runs on our Ubiquiti Edgemax router, that looks up a given MAC in its routing table and returns the associated IP if available.


Stripe is an amazing payments provider, full-stop. Their API is fantastically well documented, customer service brilliant, and tools of exceptional quality. They pay out every day, and offer low fees. I cannot recommend them highly enough.

Nike ran into a minor problem with the Stripe Checkout system: It does not work in Android WebViews. Android uses WebViews in a manner analogous to the Apple Captive Network Assistant: They sandbox public WiFi DNS capture. In Android’s case, the sandboxing is strict enough to kill Checkout.

Stripe Elements inside the MacOS Captive Network Assistant

This problem was easily solved by moving to Stripe Elements, and building a simple custom payments form utilising existing Nike styling.

Layer 1

Deploying physical network infrastructure on Lord Howe Island presents a few challenges. First, power is scarce. Second, regulatory approvals for any sort of island-modifying work are very difficult to obtain.

The property that serves as the nexus for Nike, Thornleigh Farm, is hidden inside a shield of palm forest. It is not possible to broadcast any meaningful signal out of the property, though we do offer the Public Internet network across the farm for the use of farm customers.

Fortunately, the property includes a glorious old boat-shed sitting on Lagoon Beach. Even more fortunately, an old copper conduit runs under the forest between farm and boat-shed. This enabled the installation of an optical fibre. The shed then acts as the southernmost network node.

Ubiquiti NanoBeam AC Gen2 radios provide multiple radio links in the Nike Layer 1 network

A 5Ghz link then penetrates a treeline to the north, linking to another island business, with whom we have joined forces, and who serve as the northernmost node.

All in all, a mixture of Cat6 copper, 5Ghz point-to-point radios, and optical fibre connect the satellite dishes with our server room and then on to the boat sheds on the beach.

Access Control

The Thornleigh Farm network is mostly built from Ubiquiti Unifi equipment. The WiFi networks, including the Nike ‘Public Internet’ network, are controlled by the proprietary Unifi Controller (UC), running on a local virtual machine.

The UC has a publicly documented API that ostensibly allows fairly fine grained manipulation of client network access. In practice, the documentation is not developer-friendly, and interacting with the UC was the most difficult part of the project outside construction of the physical network.

For a while, I flirted with deploying a fully custom system utilising open-source RADIUS and ChilliSpot software. This path did not bare fruit, and I settled back on bashing through the UC API.

An example of some of the calculations that occur while authorising a client

Nike functions as a Python application that interfaces with the UC whenever it needs to authorise, de-authorise, or check usage by a client. Data usage tracking is handled by custom code and stored in our local Postgres database.

The custom implementation allows us to do some fun stuff, like offer refunds of partial usage, and allow customers to stack multiple data packs on top of each other. Nike continuously updates the UC whenever a client’s remaining quota changes, and then the UC internally handles disconnecting the client when they exceed their quota .

Final Thoughts

Isolation, latency, and high operating costs make Lord Howe Island a difficult environment in which to deploy public internet. The internet is, however, a more and more crucial element of modern life. Participation in modern economic activity requires reliable connection to the internet, and I hope that in the long term Nike can serve a valuable service to residents and guests of Lord Howe Island.

If you’d like to discuss the project, hit me up on Twitter.

It’s Time To Change Section 44

Ludlam, Waters, Canavan, Roberts, Joyce – The Section 44 pain train is rolling through an ever growing list of representatives and senators. It is time for this absurdity to stop. Section 44, specifically subsection (i), reflects an outmoded, irrelevant view of what it means to be an Australian citizen. It is actively harmful to our ability to grow and prosper as a nation.

Any person who –

Is under any acknowledgement of allegiance, obedience, or adherence to a foreign power, or is a subject or a citizen or entitled to the rights or privileges of a subject or citizen of a foreign power


shall be incapable of being chosen or of sitting as a senator or a member of the House of Representatives.

The members embroiled in the dual-citizenship fiasco hail from across the political spectrum. Regardless of our persuasions, we can surely agree that all these members are patriots, acting for the best interests of the Australian people, even if they disagree about how those interests are best served.

Australia is a diverse nation of immigrants. This is our great strength, the secret sauce that has propelled us to great wealth, peace, and prosperity. We should want our parliament to reflect our diversity, to contain bridges to the world’s peoples. Such bridges, manifesting in dual-citizenships, are tools to allow our parliament  to better act in our collective interest.

If senator Canavan’s mother signed him up to be an Italian citizen without his knowledge, if senator Waters immigrated here from Canada as an infant, we should savour and welcome and support their links to these foreign lands. We should welcome senators Canavan, Waters, and others dual-citizens as a strength, a representation in the legislature of our collection diverse, immigrant selves.

The alternative, the status quo, is to admit a great insecurity about our way of life. We suggest that a dual-citizenship, however tenuous, is sufficient corruption to be likely to sway a member to act against Australia. Forget foreign spies, bribery, infiltration, inducements. No: It is enough that an infant was born in a Canadian hospital, that infant is likely to be a traitor!

We grow stronger by embracing the ties that bind us to our fellow creatures around this Earth. We have grown wealthy, safe, strong, and prosperous through such embraces, while more insular, inward looking nations struggle with the limitations such insulation imposes.

Trade, treaties, the movement of people, the flow of finance, the exchange of ideas: An Australia more tightly bound to Canada, to Italy, or to any other nation is a stronger Australia. Members of parliament with ties to these and other lands are a source of strength, not weakness. It’s time to amend the constitution, re-write section 44(i), and embrace our own strength.

Bigint vs Numeric Performance in PostgreSQL 9.6

PostgreSQL 9.6

Suppose we have an application that stores two big lists of numbers in a relational database. These numbers represent money – Both the amount of money units in transactions, and the price of that money.

In response to a client query, we must multiply the big list of transaction values by the big list of prices, and sum a running total of those multiples. We assume we can’t pre-calculate and store the running values, and that we must supply the client with strings, perhaps because they are communicating with us via a web API. Suppose clients expect responses in real time.

Storing money requires absolute precision. We can’t use floats, for obvious reasons. So we would probably use an arbitrary precision type. If we were using PostgreSQL 9.6, as this article assumes, then we might choose to use a numerictype to store both values and prices.

Numerics give us arbitrary precision, but come with some baggage. Numerics are binary coded decimals, and CPUs can’t interact with them directly using hardware instructions. Whereas two 64-bit integers will fit in a CPU register and be happily summed, two numerics need to go through a bunch of software shuffling to perform a calculation.

Therefore, if we need to perform lots of calculations quickly, numerics might slow us down when compared to using a 64-bit integer. In PostgreSQL, a 64-bit integer is called a bigint.

Hypothesis: Performing all calculations on integers of type bigint, rather than binary coded decimals of type numeric, will vastly reduce computation times when responding to our client’s queries. Let’s assume the following:

  • A price is stored as a bigintwith an implicit precision of four decimal places. For example, a price of £4.4352 is stored as 44352 (NB the absence of the decimal point).
  • An amount is stored as abigint with an implicit precision of two decimal places. This suits currencies with two decimal places of minor unit precision, such as the US Dollar (USD). For example, an amount of $12,045.42 is stored as 1204542.

The value of any transaction is therefore the price of the money (44352) times the amount of money units (1204542) divided by the extra orders of magnitude introduced by the implicit decimal places.

In our example, the price has introduced four decimal places, and the amount has introduced two decimal places, for a total of six. Therefore, the calculation is as follows:

44352 * 1204542 / 10^6

This yields 53423.846784, which may be nicely formatted for client consumption as $53,423.85.

So, to get to our true value with explicit decimal places, we need to perform division. In PostgreSQL we can’t divide two values of type bigint or we will lose precision. For example,  select (10::bigint / 3::bigint) yields 3. Because we are dealing with money, that is not acceptable.

Numerics, as arbitrary precision types, can retain precision on division. We will cast to anumeric, and divide by anothernumeric of appropriate size for our implied decimal places.

Here’s the example above written in Postgres 9.6 SQL:

    to_char(value::numeric(15,4) / 1000000::numeric(15,4), 'FM999,999,999,999.99') AS str_representation
        amount * price AS interim_value
    FROM (
            1204542::bigint AS amount,
            44352::bigint AS price
    ) AS random_data
) AS calculation

Is this really faster than using binary coded decimals? To answer that question, we’ll create a big set of random data that we can operate on.

We will assume a largish organisation that processes 250 transactions per day every day for 10 years, or 912,500 transactions, and round it up to 1,000,000 for luck. We will assume these transaction amounts are in USD, with two decimal places of implied precision, and are sized somewhere between $0 and $200. Each amount is processed at a price to convert the USD to some Pounds Sterling (GBP). Let’s assume (somewhat absurdly) that each GBP price is somewhere between £0 and £10 with four decimal places of implied precision.

    generate_series (1, 1000000) as row_id,
    (random() * 20000)::bigint AS amount,
    (random() * 100000)::bigint AS price

My machine features an i7-5557U CPU, and it takes ~670ms to generate these data, which look like this:

row_id   | amount | price  
       1 |  14179 |  48998
       2 |  16948 |  56369
       3 |  12760 |  16965
       4 |  17177 |    977
       5 |  11632 |  38872
       6 |  18370 |  44416
       7 |  14370 |  89625

Now, let’s perform the necessary calculations. For each of the million rows, we need to multiply the amount by the price, and then sum those multiples into a running balance column.

    SUM(value) OVER (ORDER BY row_id) AS balance
        row_id, amount * price AS value
    FROM (
            generate_series (1,1000000) as row_id,
            (random()*20000)::bigint AS amount, 
            (random()*10000)::bigint AS price
    ) AS random_data
) AS raw_values

This takes ~1,550ms on my i7-5557U. Less the ~670ms to generate the data yields a calculation time of ~880ms. Computers are so cool. That’s just 880 nanoseconds per row! Here’s some sample output:

   value   |    balance     
  15987240 |       15987240
   5282332 |       21269572
  32625180 |       53894752
 125860125 |      179754877
   3753301 |      183508178
  79953750 |      263461928
  74762297 |      338224225

These huge integers, with their implicit decimal places, are not very useful. They are certainly not human readable. We need to convert them into decimal numbers with explicit decimal places. To do so, we will cast them to type numeric. Finally, recall that our clients require string output, so we will convert the numericinto a nicely formatted stringusing the to_char()function.

    to_char(value::numeric / 1000000::numeric, 'FM999,999,999,990.90') as str_value,
    to_char(balance::numeric / 1000000::numeric, 'FM999,999,999,990.90') as str_balance
        SUM(value) OVER (ORDER BY row_id) AS balance
    FROM (
            row_id, amount * price AS value
        FROM (
                generate_series (1, 1000000) as row_id,
                (random() * 20000)::bigint AS amount, 
                (random() * 100000)::bigint AS price
        ) AS random_data
    ) AS raw_values
) AS balances

Bam! Approximately 2,700ms, Less generation (~670ms) and calculation (~880ms), that’s ~1,150ms. All that bigint::numerictype casting andnumeric / numericdivision adds a good chunk of change. Here is some sample output:

 string_value | string_balance 
 1,029.03     | 1,029.03
 85.97        | 1,115.00
 291.78       | 1,406.78
 670.64       | 2,077.42
 62.33        | 2,139.75
 142.92       | 2,282.68
 2.41         | 2,285.09
 31.01        | 2,316.10

Alright, let’s try the same thing using numeric the whole way through. Then we will compare the difference. First, we’ll generate the random data. We’ll use a numeric with 19 digits of precision, to match the 19 positive number digits we get from a PostgreSQL bigint.

    generate_series (1, 1000000) as row_id,
    (random() * 200)::numeric(17,2) AS amount, 
    (random() * 1)::numeric(15,4) AS price

Generating the random data takes ~2,200ms, and looks as follows:

 row_id  | amount | price  
       1 |  84.33 | 0.8137
       2 | 178.67 | 0.0793
       3 | 115.89 | 0.1935
       4 |   5.61 | 0.9763
       5 |  32.45 | 0.8218
       6 | 154.49 | 0.4451
       7 | 127.43 | 0.3493

Now the basic balance calculation, wherein we multiply amount by price, and then sum a running total of those multiples:

    SUM(value) OVER (ORDER BY row_id) AS balance
        row_id, amount * price AS value
    FROM (
            generate_series (1, 1000000) as row_id,
            (random() * 200)::numeric(17,2) AS amount, 
            (random() * 1)::numeric(15,4) AS price
    ) AS random_data
) AS raw_values
That takes ~3,550ms. Less generation time (2,200ms) yields a calculation time of ~1,350ms. The output looks like this:
   value    |     balance     
  32.893335 |       32.893335
   9.424890 |       42.318225
  46.160686 |       88.478911
 134.282533 |      222.761444
  19.844650 |      242.606094
   8.255778 |      250.861872
   2.859764 |      253.721636

Finally, the complete query. This time, we already have a list of values of type numeric, so we don’t need to peform the initial typecasting step we did in the bigintversion. Instead, we just use to_char()to create pretty strings.

    to_char(value, 'FM999,999,999,990.90') as str_value,
    to_char(balance, 'FM999,999,999,990.90') as str_balance
        SUM(value) OVER (ORDER BY row_id) AS balance
    FROM (
            row_id, amount * price AS value
        FROM (
                generate_series (1, 1000000) as row_id,
                (random() * 200)::numeric(17,2) AS amount, 
                (random() * 1)::numeric(15,4) AS price
        ) AS random_data
    ) AS raw_values
) AS balances

The full query, in numeric form, takes ~4,200ms. Less generation (2,200ms) and calculation (~1,350ms) time, the string processing takes ~650ms.

As expected, the nativelynumeric query is much faster at generating strings: ~650ms vs ~1,150ms, or 56% of the nativebigint query time.

However, the bigint query wins in the calculation stakes: ~880ms vs ~1,350, or 65% of thenumeric time. That means that excluding the random data generation time, the two queries run pretty much equal: 2,030ms for thebigint, and 2,000ms for thenumeric.

Of course, we must consider the elephant in the room: Maximum precision. Usingthebigint technique, we lose a huge chunk of precision every time we perform a calculation. With an amount featuring two implied decimals, and a price featuring four, we lose six digits.

A PostgreSQLbigint can store a maximum positive value of 9,223,372,036,854,775,807,or several quintillion units. Less the digits we used for implied decimals, that falls to two trillion units.

That puts thebigint calculation in a bind. Say we have a query that requires a further calculation step: Perhaps price is calculated as the multiple of two numbers with four decimal digits, e.g. 2.4561 * 1.3258. That is a common case for currencies, and would caused our maximum balance value to fall to around eight hundred million.

Based on the above analysis, we might reasonably assume that such a calculation heavy query denominated inbigint might pull away from anumeric version. However, the loss of maximum balance value might preclude its use for clients with big balance numbers.

As is the way of these things, I’ve now got more questions than I did when I started. What I want to know now is:

  • Does changing thenumeric precision definition, e.g. numeric(9,4)instead of numeric(15,4)affect PostgreSQL calculation performance?
  • We ignored retrieval speed in this analysis. When the data is coming off the disk, donumericandbigint cause materially different read times?
  • Does adding an extra calculation step cause thebigint method to pull away fromnumeric?
  • Couldbigint andnumeric be substituted depending on client needs, allowing small clients to benefit from speed while big ones benefit from arbitrary precision?

Questions for another day.

Chat to me about this article anytime on Twitter.

Intel Processors in the 2017 Macbook Pro

WWDC just rolled around, and techy news sites are running their usual breathless summary articles. They list of the high level features of new Macbook Pros,  but never seem to give any technical information about the processors beyond superficial and meaningless ‘Ghz’ numbers.

This is probably appropriate for most of their readers. Apple knows that its customers don’t usually care about the difference between an i7-5557U and an i7-7660U, so they don’t provide model numbers on their site, either.

It’s possible to figure out what processors are lurking behind the marketing vomit by visiting Intel’s ARK knowledge base. Cross reference timing, base and maximum clock frequencies, and voila you’ve got yourself technical data upon which to base purchasing decisions.

Here are all the processors Apple has put in their Macbook Pro machines this WWDC:

Macbook Pro 13


What should boys make of Wonder Woman?

Boys, let’s be selfish for a moment. You’ve probably read a lot of great commentary about Wonder Woman from a feminist perspective. Let’s talk about it from the male perspective, about what she means for us.

Action movies serve up a plethora of fantastic male heroes: They might be tounge-in-cheek, like Keanu Reeves as John Wick. Or serious but self-aware, like Daniel Craig as James Bond. Or just plain serious, unrelenting, and indefatigable à la Denzel Washington in Man on Fire. These men are superheroes, in on way or another.

They are physically strong, attractive, and capable. They are emotionally tortured by what they’ve seen and done. They are smart. They are fighters. They are who we imagine ourselves to be, should we ever find ourselves in a position where we must save the world from great evil.

What of Wonder Woman? What Consider her aggression, her anger,  her fire, ferocity, and fury in the face of threats to those she loves.  Revel in her will to win.

These are the qualities we associate with our Bonds, Wicks, and Creasy’s, wherein they are warriors for good and, in their best moments, inspiring. Yet for heterosexual males, bar the odd surprise-semi when Daniel Craig gets out of the ocean, they’re not blood-pumping.

The worst elements of male chauvinism present weak females as the ideal: Men fight, women stay home, keep themselves clean and pretty, and submit to the dominant male upon his return from the battlefield. In their less inspiring moments, our male superheroes act out this ideal with some token female co-star. These females are often physically stunning, and framed by the camera as the most tantalising of treats.

Yet they cannot hold a candle to Wonder Woman. Wonder Women oozes sexuality. Not because of her physical appearance, as great as those legs may be, but because she fights. She roars. She charges into battle and defeats her foes with unrelenting, unashamed application of physical force. And she quotes Greek literature.

Some men might find the weak, protected female form attractive. Their desires dominate our media landscape. You and I know they’re missing out. Give me a woman that stands with me on the battlefield. One that fights alongside me, bleeds with me, stands tall beside me in the face of danger, screams with me into the jaws of impending doom.

She’s the one you want to kiss passionately after a great victory. She’s the one you trust to be your companion through life, because she has literally saved your life, as you have hers. She’s strong, attractive, capable, emotionally tortured, smart, a fighter – Just like you, because in this story, you are the hero too. You are a team. An ass-kicking, sexy,  battle-winning team.

Boys, in our own selfish interests, we should be cheering on Wonder Woman. We should be cheering womens-only screenings, we should be whooping in the cinema, and we should be encouraging artists to create more films like it. We should be thrilled to indulge in fantasies where both ourselves and our partners are superheroes.

Of course, we must also stop thinking with our dicks, being selfish, and acknowledge all the other reasons why Wonder Woman is important. Other writers have argued those reasons much better than I ever could. This was just between us men. Wonder Woman is not just a victory for women in film, it’s a victory for all of us as people.

Drone Mapping

Having a birds eye perspective on a problem facilitates better decision making. Quite literally, being able to look down upon a geographic area allows one to craft better plans. Modern mapping services like Google Maps grant views with exceptional clarity.

Google’s photogrammetric mapping is astoundingly good – Here’s the Lower Haight district in San Francisco

Such services are not available on Thornleigh Farm – located, as it is, on remote Lord Howe Island. This is understandable: Internet giants like Google are hardly going to expend their resources producing hyper-detailed maps of a remote island in the Tasman Sea.

A freely available satellite survey of Lord Howe Island provided on Google Maps

The best resolution I’ve been able to get from freely available satellite surveys is generally about 30cm / pixel. This isn’t enough to usefully inform decision making. Enter consumer-grade drone technology. Using a DJI Mavic Pro drone, I’ve been able to produce aerial surveys with a resolution as fine as 1cm / pixel.

An individual image from a DJI Mavic Pro flying over Thornleigh Farm

Images can be stitched together using Adobe Photoshop’s inbuilt Photomerge functionality. I took approximately 200 photos and then used Photomerge to stitch them all together at once. On my laptop, this took only a few hours. The result is so intimately detailed that I don’t want to post it here, out of respect for our privacy on the farm.

Drones like the DJI Mavic Pro are not cheap. But they are significantly cheaper than orbiting satellites, and presumably cheaper than the vehicles Google uses to produce photogrammetric maps. Yet they allow a small business on a remote island to produce fantastic, detailed aerial surveys to inform better decision making. If anyone ever tells you drones are useless toys – Point them to Thornleigh Farm.

Exchange-traded index funds are our friends

Horsies, doggies and casinos are a great analogy for stock markets. There are a lot of testosterone-amped men running around spinning yarns about how they know which horse will win, and why. The reality is, if anyone actually has scientific evidence suggesting strongly which horse will win, they are betting big and keeping quiet.

So it goes with shares. No one knows. Anyone that says they know is benevolently ignorant, or malevolently misleading.

As small scale investors, we are ripe targets. Just like intoxicated punters studying the form, we can be lured with promises of a quick buck, a certain return. There is no such thing – But there is big money in telling us there is, because if we trade lots, we pay lots of fees, and feed lots of payouts. So we are up against it.

Yet we still want a place to park our money for the long term, where it can work for us. This is where exchange-traded index funds are our friends. Instead of betting a single horse will win, they are a bet that some horses will win, some horses will lose, but that everyone will have a pretty good time and come back next year.

Pick a broad enough fund, and sit on it long enough, and you have an extremely high probability of achieving two objectives: Keep your capital safe, and earn a useful return. You won’t get rich, but you won’t get fucked, and there are a great many people out there with a serious interest in you being fucked.