I'd say pg_column_size
is reporting the compressed size of TOAST
ed values, while octet_length
is reporting the uncompressed sizes. I haven't verified this by checking the function source or definitions, but it'd make sense, especially as strings of numbers will compress quite well. You're using EXTENDED
storage so the values are eligible for TOAST
compression. See the TOAST
documentation.
As for calculating expected DB size, that's whole new question. As you can see from the following demo, it depends on things like how compressible your strings are.
Here's a demonstration showing how octet_length
can be bigger than pg_column_size
, demonstrating where TOAST kicks in. First, let's get the results on query output where no TOAST
comes into play:
regress=> SELECT octet_length(repeat('1234567890',(2^n)::integer)), pg_column_size(repeat('1234567890',(2^n)::integer)) FROM generate_series(0,12) n;
octet_length | pg_column_size
--------------+----------------
10 | 14
20 | 24
40 | 44
80 | 84
160 | 164
320 | 324
640 | 644
1280 | 1284
2560 | 2564
5120 | 5124
10240 | 10244
20480 | 20484
40960 | 40964
(13 rows)
Now let's store that same query output into a table and get the size of the stored rows:
regress=> CREATE TABLE blah AS SELECT repeat('1234567890',(2^n)::integer) AS data FROM generate_series(0,12) n;
SELECT 13
regress=> SELECT octet_length(data), pg_column_size(data) FROM blah;
octet_length | pg_column_size
--------------+----------------
10 | 11
20 | 21
40 | 41
80 | 81
160 | 164
320 | 324
640 | 644
1280 | 1284
2560 | 51
5120 | 79
10240 | 138
20480 | 254
40960 | 488
(13 rows)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…