indexing - PostgreSQL index not used for query on IP ranges

Question

Welcome To Ask or Share your Answers For Others

indexing - PostgreSQL index not used for query on IP ranges

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

indexing - PostgreSQL index not used for query on IP ranges

I'm using PostgreSQL 9.2 and have a table of IP ranges. Here's the SQL:

CREATE TABLE ips (
  id serial NOT NULL,
  begin_ip_num bigint,
  end_ip_num bigint,
  country_name character varying(255),
  CONSTRAINT ips_pkey PRIMARY KEY (id )
)

I've added plain B-tree indices on both begin_ip_num and end_ip_num:

CREATE INDEX index_ips_on_begin_ip_num ON ips (begin_ip_num);
CREATE INDEX index_ips_on_end_ip_num ON ips (end_ip_num );

The query being used is:

SELECT ips.* FROM ips
WHERE 3065106743 BETWEEN begin_ip_num AND end_ip_num;

The problem is that my BETWEEN query is only using the index on begin_ip_num. After using the index, it filters the result using end_ip_num. Here's the EXPLAIN ANALYZE result:

Index Scan using index_ips_on_begin_ip_num on ips  (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1)
Index Cond: (3065106743::bigint >= begin_ip_num)
Filter: (3065106743::bigint <= end_ip_num)
Rows Removed by Filter: 47596
Total runtime: 16.425 ms

I've already tried various combinations of indices including adding a composite index on both begin_ip_num and end_ip_num.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:38:50+0000

Try a multicolumn index, but with reversed order on the second column:

CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

Ordering is mostly irrelevant for a single-column index, since it can be scanned backwards almost as fast. But it is important for multicolumn indexes.

With the index I propose, Postgres can scan the first column and find the address, where the rest of the index fulfills the first condition. Then it can, for each value of the first column, return all rows that fulfill the second condition, until the first one fails. Then jump to the next value of the first column, etc.
This is still not very efficient and Postgres may be faster just scanning the first index column and filtering for the second. Very much depends on your data distribution.

Either way, CLUSTER using the multicolumn index from above can help performance:

CLUSTER ips USING index_ips_begin_end_ip_num

This way, candidates fulfilling your first condition are packed onto the same or adjacent data pages. Can help performance a lot with if you have lots of rows per value of the first column. Else it is hardly effective.
(There are also non-blocking external tools for the purpose: pg_repack or pg_squeeze.)

Also, is autovacuum running and configured properly or have you run ANALYZE on the table? You need current statistics for Postgres to pick appropriate query plans.

What would really help here is a GiST index for a int8range column, available since PostgreSQL 9.2.

Categories

indexing - PostgreSQL index not used for query on IP ranges

indexing - PostgreSQL index not used for query on IP ranges

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags