Suppose we have an application that stores two big lists of numbers in a relational database. These numbers represent money – Both the amount of money units in transactions, and the price of that money.

In response to a client query, we must multiply the big list of transaction values by the big list of prices, and sum a running total of those multiples. We assume we can’t pre-calculate and store the running values, and that we must supply the client with strings, perhaps because they are communicating with us via a web API. Suppose clients expect responses in real time.

Storing money requires absolute precision. We can’t use floats, for obvious reasons. So we would probably use an arbitrary precision type. If we were using PostgreSQL 9.6, as this article assumes, then we might choose to use a `numeric`

type to store both values and prices.

Numerics give us arbitrary precision, but come with some baggage. Numerics are binary coded decimals, and CPUs can’t interact with them directly using hardware instructions. Whereas two 64-bit integers will fit in a CPU register and be happily summed, two numerics need to go through a bunch of software shuffling to perform a calculation.

Therefore, if we need to perform lots of calculations quickly, numerics might slow us down when compared to using a 64-bit integer. In PostgreSQL, a 64-bit integer is called a `bigint`

.

Hypothesis: Performing all calculations on integers of type `bigint`

, rather than binary coded decimals of type `numeric`

, will vastly reduce computation times when responding to our client’s queries. Let’s assume the following:

- A price is stored as a
`bigint`

with an implicit precision of four decimal places. For example, a price of £4.4352 is stored as`44352`

(NB the absence of the decimal point). - An amount is stored as a
`bigint`

with an implicit precision of two decimal places. This suits currencies with two decimal places of minor unit precision, such as the US Dollar (USD). For example, an amount of $12,045.42 is stored as`1204542.`

The value of any transaction is therefore the price of the money (`44352`

) times the amount of money units (`1204542`

) divided by the extra orders of magnitude introduced by the implicit decimal places.

In our example, the price has introduced four decimal places, and the amount has introduced two decimal places, for a total of six. Therefore, the calculation is as follows:

44352 * 1204542 / 10^6

This yields `53423.846784`

, which may be nicely formatted for client consumption as $53,423.85.

So, to get to our true value with explicit decimal places, we need to perform division. In PostgreSQL we can’t divide two values of type `bigint`

or we will lose precision. For example, `select (10::bigint / 3::bigint)`

yields `3`

. Because we are dealing with money, that is not acceptable.

Numerics, as arbitrary precision types, can retain precision on division. We will cast to a`numeric`

, and divide by another`numeric`

of appropriate size for our implied decimal places.

Here’s the example above written in Postgres 9.6 SQL:

SELECT to_char(value::numeric(15,4) / 1000000::numeric(15,4), 'FM999,999,999,999.99') AS str_representation FROM ( SELECT amount * price AS interim_value FROM ( SELECT 1204542::bigint AS amount, 44352::bigint AS price ) AS random_data ) AS calculation

Is this really faster than using binary coded decimals? To answer that question, we’ll create a big set of random data that we can operate on.

We will assume a largish organisation that processes 250 transactions per day every day for 10 years, or 912,500 transactions, and round it up to 1,000,000 for luck. We will assume these transaction amounts are in USD, with two decimal places of implied precision, and are sized somewhere between $0 and $200. Each amount is processed at a price to convert the USD to some Pounds Sterling (GBP). Let’s assume (somewhat absurdly) that each GBP price is somewhere between £0 and £10 with four decimal places of implied precision.

SELECT generate_series (1, 1000000) as row_id, (random() * 20000)::bigint AS amount, (random() * 100000)::bigint AS price

My machine features an i7-5557U CPU, and it takes ~670ms to generate these data, which look like this:

row_id | amount | price ---------+--------+-------- 1 | 14179 | 48998 2 | 16948 | 56369 3 | 12760 | 16965 4 | 17177 | 977 5 | 11632 | 38872 6 | 18370 | 44416 7 | 14370 | 89625

Now, let’s perform the necessary calculations. For each of the million rows, we need to multiply the amount by the price, and then sum those multiples into a running balance column.

SELECT value, SUM(value) OVER (ORDER BY row_id) AS balance FROM ( SELECT row_id, amount * price AS value FROM ( SELECT generate_series (1,1000000) as row_id, (random()*20000)::bigint AS amount, (random()*10000)::bigint AS price ) AS random_data ) AS raw_values

This takes ~1,550ms on my i7-5557U. Less the ~670ms to generate the data yields a calculation time of ~880ms. Computers are so cool. That’s just 880 nanoseconds per row! Here’s some sample output:

value | balance -----------+---------------- 15987240 | 15987240 5282332 | 21269572 32625180 | 53894752 125860125 | 179754877 3753301 | 183508178 79953750 | 263461928 74762297 | 338224225

These huge integers, with their implicit decimal places, are not very useful. They are certainly not human readable. We need to convert them into decimal numbers with explicit decimal places. To do so, we will cast them to type `numeric`

. Finally, recall that our clients require string output, so we will convert the `numeric`

into a nicely formatted `string`

using the `to_char()`

function.

SELECT to_char(value::numeric / 1000000::numeric, 'FM999,999,999,990.90') as str_value, to_char(balance::numeric / 1000000::numeric, 'FM999,999,999,990.90') as str_balance FROM ( SELECT value, SUM(value) OVER (ORDER BY row_id) AS balance FROM ( SELECT row_id, amount * price AS value FROM ( SELECT generate_series (1, 1000000) as row_id, (random() * 20000)::bigint AS amount, (random() * 100000)::bigint AS price ) AS random_data ) AS raw_values ) AS balances

Bam! Approximately 2,700ms, Less generation (~670ms) and calculation (~880ms), that’s ~1,150ms. All that `bigint::numeric`

type casting and`numeric / numeric`

division adds a good chunk of change. Here is some sample output:

string_value | string_balance --------------+---------------- 1,029.03 | 1,029.03 85.97 | 1,115.00 291.78 | 1,406.78 670.64 | 2,077.42 62.33 | 2,139.75 142.92 | 2,282.68 2.41 | 2,285.09 31.01 | 2,316.10

Alright, let’s try the same thing using `numeric`

the whole way through. Then we will compare the difference. First, we’ll generate the random data. We’ll use a `numeric`

with 19 digits of precision, to match the 19 positive number digits we get from a PostgreSQL `bigint`

.

SELECT generate_series (1, 1000000) as row_id, (random() * 200)::numeric(17,2) AS amount, (random() * 1)::numeric(15,4) AS price

Generating the random data takes ~2,200ms, and looks as follows:

row_id | amount | price ---------+--------+-------- 1 | 84.33 | 0.8137 2 | 178.67 | 0.0793 3 | 115.89 | 0.1935 4 | 5.61 | 0.9763 5 | 32.45 | 0.8218 6 | 154.49 | 0.4451 7 | 127.43 | 0.3493

Now the basic balance calculation, wherein we multiply amount by price, and then sum a running total of those multiples:

SELECT value, SUM(value) OVER (ORDER BY row_id) AS balance FROM ( SELECT row_id, amount * price AS value FROM ( SELECT generate_series (1, 1000000) as row_id, (random() * 200)::numeric(17,2) AS amount, (random() * 1)::numeric(15,4) AS price ) AS random_data ) AS raw_values

value | balance ------------+----------------- 32.893335 | 32.893335 9.424890 | 42.318225 46.160686 | 88.478911 134.282533 | 222.761444 19.844650 | 242.606094 8.255778 | 250.861872 2.859764 | 253.721636

Finally, the complete query. This time, we already have a list of values of type `numeric`

, so we don’t need to peform the initial typecasting step we did in the `bigint`

version. Instead, we just use `to_char()`

to create pretty strings.

SELECT to_char(value, 'FM999,999,999,990.90') as str_value, to_char(balance, 'FM999,999,999,990.90') as str_balance FROM ( SELECT value, SUM(value) OVER (ORDER BY row_id) AS balance FROM ( SELECT row_id, amount * price AS value FROM ( SELECT generate_series (1, 1000000) as row_id, (random() * 200)::numeric(17,2) AS amount, (random() * 1)::numeric(15,4) AS price ) AS random_data ) AS raw_values ) AS balances

The full query, in numeric form, takes ~4,200ms. Less generation (2,200ms) and calculation (~1,350ms) time, the string processing takes ~650ms.

As expected, the natively`numeric`

query is much faster at generating strings: ~650ms vs ~1,150ms, or 56% of the native`bigint`

query time.

However, the bigint query wins in the calculation stakes: ~880ms vs ~1,350, or 65% of the`numeric`

time. That means that excluding the random data generation time, the two queries run pretty much equal: 2,030ms for the`bigint`

, and 2,000ms for the`numeric`

.

Of course, we must consider the elephant in the room: Maximum precision. Usingthe`bigint`

technique, we lose a huge chunk of precision every time we perform a calculation. With an amount featuring two implied decimals, and a price featuring four, we lose six digits.

A PostgreSQL`bigint`

can store a maximum positive value of 9,223,372,036,854,775,807,or several quintillion units. Less the digits we used for implied decimals, that falls to two trillion units.

That puts the`bigint`

calculation in a bind. Say we have a query that requires a further calculation step: Perhaps price is calculated as the multiple of two numbers with four decimal digits, e.g. 2.4561 * 1.3258. That is a common case for currencies, and would caused our maximum balance value to fall to around eight hundred million.

Based on the above analysis, we might reasonably assume that such a calculation heavy query denominated in`bigint`

might pull away from a`numeric`

version. However, the loss of maximum balance value might preclude its use for clients with big balance numbers.

As is the way of these things, I’ve now got more questions than I did when I started. What I want to know now is:

- Does changing the
`numeric`

precision definition, e.g.`numeric(9,4)`

instead of`numeric(15,4)`

affect PostgreSQL calculation performance? - We ignored retrieval speed in this analysis. When the data is coming off the disk, do
`numeric`

and`bigint`

cause materially different read times? - Does adding an extra calculation step cause the
`bigint`

method to pull away from`numeric`

? - Could
`bigint`

and`numeric`

be substituted depending on client needs, allowing small clients to benefit from speed while big ones benefit from arbitrary precision?

Questions for another day.