Grouping database table columns the right way

3 min read

Does it matter what order the columns are aligned in? This may come as a bit of a surprise, but it does, and different order in table schema may result in a big difference on your server storage space. For example, do you know how many artists have recorded albums with several to tens of tracks in each? About a couple of millions artists with one to several tens of albums and even more songs.

Even if a table contains the same data, the order of the columns can affect how much space it takes up on the disk. Here is an abstract example:

CREATE TABLE test_table ( i1 int,
i2 int,
i3 int,
v1 varchar(100),
v2 varchar(100),
v3 varchar(100)
);

INSERT INTO test_table SELECT 1, 2, 3, 'abcd', 'abcd', 'abcd' FROM generate_series(1, 10000000);

There is now a table with six columns. The table contains three columns of integer data and three columns of VARCHAR data. The second statement included 10 million additional rows.

It's easy to create a list of numbers using the generate_series command. The generate_series is not utilized in this example. The function is only being used by me to repeat the static data from the SELECT clause. Now, let’s check the table size:

test=# SELECT pg_size_pretty(pg_relation_size('test_table'));
pg_size_pretty
----------------
574 MB
(1 row)

The size of a table in bytes is returned by the pg relation size function. This is somewhat unhelpful in case of a large relation because users could easily end up with extremely large, difficult-to-read numbers. As a result, the pg_size function call should be used to wrap the return value of the pg_relation_size function. This makes the size much more useful because the data is returned in a more readable format.

Let's try the same example once more. This time, the columns are arranged differently, but the rest of the example is exactly the same.

test=# CREATE TABLE test_table ( v1 varchar(100),
i1 int,
v2 varchar(100),
i2 int,
v3 varchar(100),
i3 int
);

test=# INSERT INTO test_table SELECT 'abcd', 10, 'abcd', 20, 'abcd', 30 FROM generate_series(1, 10000000);

test=# SELECT pg_size_pretty(pg_relation_size('test_table'));
pg_size_pretty
----------------
651 MB
(1 row)

Despite the fact that the data in the table has not changed, it has significantly expanded. Alignment, which can be found deep within the source code, is the cause of this issue.

According to the theory, a CPU needs more operations to deal with the field that does not begin at a multiplier of word-size unique to a specific CPU model. As a result, databases like PostgreSQL will physically align data within this rule.

The main idea is that it can make sense to arrange columns with related data types close to one another. Of course, the content has a significant impact on both the outcome and the potential size difference. In this example, using "abc" instead of "abcd" would have produced identical results because both table versions with the same schema would have had the same size on server storage.