Glider hints in PostgreSQL

we Know that SQL is a declarative language, which specifies "what" we want to select from the database, and "how" to do it — DBMS decides itself. The task of selection for the SQL query of a particular method of its execution(plan) is decided by the query planner, which is in almost any DBMS. But sometimes he chooses not the best plan. Many commercial DBMSs provide in this case the "hints" that allow you to manually tell the database how to execute the query. Open Source PostgreSQL DBMS such mechanism was not.

And now, finally, there was what many dreamed of and then got tired of waiting, and others were afraid. Japanese developers of the NTT implemented hints glider PostgreSQL. Moreover, they managed to do it without changing the kernel in a separate module pg_hint_plan supporting PostgreSQL 9.1 and 9.2. The module implements hints, allowing to establish the methods of scanning and joining tables, setting the values of the GUC. For details of installation and use, welcome to under the cat.


From our site you can download the source archive separately under 9.1 and 9.2, which, however, do not differ absolutely nothing, and equally, going under both versions. Well, okay. Assembly and installation of module does not cause problems: make && make install. For Assembly you will need dev-PostgreSQL package from your favourite distribution. To PostgreSQL picked up the module, no SQL should not be executed, it is sufficient to add pg_hint_plan in variable shared_preload_libraries in postgresql.conf (you can instead load the module in each session, where necessary, using the LOAD command). After restarting the server, will be available in three new GUC variables: pg_hint_plan.enable_hint, pg_hint_plan.debug_print, pg_hint_plan.parse_messages. The first of them is responsible for the availability of hint (enabled by default), the remaining two for logging.

Hints are specified in comments to the request, decorated with /* and */. A comment was interpreted as a hint that he at the beginning should be a + sign, e.g. /*+ SeqScan(t1) */. Hints are of the following types.

Hints file responsible for the method table scan

the
    the
  • SeqScan (table name)
  • the
  • TidScan (table name)
  • the
  • IndexScan (table name [an index name])
  • the
  • IndexOnlyScan (a table name [an index name])
  • the
  • BitmapScan (a table name [an index name])
  • the
  • NoSeqScan (table name)
  • the
  • NoTidScan (table name)
  • the
  • NoIndexScan (a table name)
  • the
  • NoIndexOnlyScan (table name)
  • the
  • NoBitmapScan (table name)

As the name table can be specified as the name of the table and its alias in the query.

Hints file responsible for the method of connection of tables

the
    the
  • NestLoop (list of table names)
  • the
  • HashJoin (list of table names)
  • the
  • MergeJoin (list of table names)
  • the
  • NoNestLoop (list of table names)
  • the
  • NoHashJoin (list of table names)
  • the
  • NoMergeJoin (list of table names)

A list of table names specified using the spacebar. He is sensitive to the order, that is, the connection is made in the order in which the tables are listed.

Also a separate two rooms:
the
    the
  • Leading (the list of table names) — sets the join order of the tables without specifying a particular method of the connection
  • the
  • Set(GUC value) — sets the value of the GUC variable at query execution time. In General any new functionality not bear GUC and so it was possible to install just the hint allows you to do this more succinctly (and quickly?).

The time has come to try all this in. Let's create a test table, indexes, collect statistics.

the
CREATE TABLE test1 AS (SELECT id, (random()*1000)::int AS id_2, random() AS value1, random() AS value2 FROM generate_series(1,1000000) id);
CREATE TABLE test2 AS (SELECT id, random() AS value FROM generate_series(1.1000 level) id);
CREATE INDEX test1_id_idx ON test1 (id);
CREATE INDEX test1 ON test1_id_2_idx (id_2);
CREATE INDEX test1_value1_idx ON test1 (value1);
CREATE INDEX test1 ON test1_value2_idx (value2);
CREATE INDEX test2_id_idx ON test2 (id);
CREATE INDEX test2_value_idx ON test2 (value);
VACUUM ANALYZE;


Suppose we have a query that filter data by the values of two fields.
the
SELECT * FROM test1 WHERE value1 BETWEEN 0.5 and 0.505 AND value2 BETWEEN 0.6 and 0.61;


The glider decides to combine the results of the scan indices for each of the fields using Bitmap Scan.
the
 the QUERY PLAN 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap  Heap  Scan on test1 (cost=319.82..514.76 rows=52 width=24) (actual time=9.575..9.736 rows=59 loops=1)
Recheck Cond: ((value1 > = 0.5::double precision) AND (value1 < = 0.505::double precision) AND (value2 > = 0.6::double precision) AND (value2 < = 0.61::double precision))
-> BitmapAnd (cost=319.82 319.82..rows=52 width=0) (actual time=..9.529 9.529 rows=0 loops=1)
-> Bitmap Index Scan on test1_value1_idx (cost=0.00..113.54 rows=5318 width=0) (actual time=..2.839 2.839 rows=5072 loops=1)
Index Cond: ((value1 > = 0.5::double precision) AND (value1 < = 0.505::double precision))
-> Bitmap Index Scan on test1_value2_idx (cost=0.00..206.00 rows=9764 width=0) (actual time=5.385 5.385..rows=10070 loops=1)
Index Cond: ((value2 > = 0.6::double precision) AND (value2 < = 0.61::double precision))
Total runtime: 9.805 ms


However, we can force it to use a normal Index Scan.
the
/*+ IndexScan(test1) */ SELECT * FROM test1 WHERE value1 BETWEEN 0.5 and 0.505 AND value2 BETWEEN 0.6 and 0.61;


the
 the QUERY PLAN 
--------------------------------------------------------------------------------------------------------------------------------
Index Scan using test1_value1_idx on test1 (cost=0.00..15198.71 rows=52 width=24) (actual time=0.124..10.704 rows=59 loops=1)
Index Cond: ((value1 > = 0.5::double precision) AND (value1 < = 0.505::double precision))
Filter: ((value2 > = 0.6::double precision) AND (value2 < = 0.61::double precision))
Total runtime: 10.776 ms


And even to force it to use another index.
the
/*+ IndexScan(test1 test1_value2_idx) */ SELECT * FROM test1 WHERE value1 BETWEEN 0.5 and 0.505 AND value2 BETWEEN 0.6 and 0.61;


the
 the QUERY PLAN 
--------------------------------------------------------------------------------------------------------------------------------
Index Scan using test1_value2_idx on test1 (cost=0.00..22463.60 rows=52 width=24) (actual time=0.787..15.757 rows=59 loops=1)
Index Cond: ((value2 > = 0.6::double precision) AND (value2 < = 0.61::double precision))
Filter: ((value1 > = 0.5::double precision) AND (value1 < = 0.505::double precision))
Total runtime: 15.816 ms
(4 rows)


A more complex example. Connecting two tables with a filtration across the field from one table, sorting on the field and LIMIT the other.
the
SELECT * FROM test1 t1 JOIN test2 t2 ON t1.id_2 = t2.id WHERE t2.value BETWEEN 0.5 AND 0.51 ORDER BY t1.value1 LIMIT 100;


The glider chooses plan c for test1_value1_idx Index Scan and Nested Loop.
the
 the QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.33..2149.77 rows=100 width=36) (actual time=0.274..34.784 rows=100 loops=1)
-> Nested Loop (cost=4.33..171467.82 rows=7992 width=36) (actual time=34.753..0.271 rows=100 loops=1)
Join Filter: (t1.id_2 = t2.id)
-> Index Scan using test1_value1_idx on test1 t1 (cost=0.00..51457.05 rows=1000000 width=24) (actual time=0.022..10.338 rows=11873 loops=1)
-> Materialize (cost=4.33..10.80 rows=8 width=12) (actual time=0.000..0.001 rows=8 loops=11873)
-> Bitmap Heap Scan on test2 t2 (cost=4.33..10.76 rows=8 width=12) (actual time=0.035..0.046 rows=8 loops=1)
Recheck Cond: ((value >= 0.5::double precision) AND (value < = 0.51::double precision))
-> Bitmap Index Scan on test2_value_idx (cost=0.00..4.33 rows=8 width=0) (actual time=0.026..0.026 rows=8 loops=1)
Index Cond: ((value >= 0.5::double precision) AND (value < = 0.51::double precision))
Total runtime: 34.870 ms


Suppose we want to use another type of table join: HashJoin.
the
/*+ HashJoin(t1 t2) */ EXPLAIN ANALYZE SELECT * FROM test1 t1 JOIN test2 t2 ON t1.id_2 = t2.id WHERE t2.value BETWEEN 0.5 AND 0.51 ORDER BY t1.value1 LIMIT 100;


Glider to obey, adding inside Bitmap Index Scan on test2, and outside — sorting with Limit.
the
 the QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=20516.23 20516.48..rows=100 width=36) (actual time=..156.219 156.230 rows=100 loops=1)
-> Sort (cost=20516.23 20536.21..rows=7992 width=36) (actual time=..156.217 156.225 rows=100 loops=1)
Sort Key: t1.value1
Sort Method: top-N heapsort Memory: 32kB
-> Hash Join (cost=10.86..20210.78 rows=7992 width=36) (actual time=0.248..154.286 rows=7889 loops=1)
Hash Cond: (t1.id_2 = t2.id)
-> Seq Scan on test1 t1 (cost=0.00..16370.00 rows=1000000 width=24) (actual time=0.013..63.210 rows=1000000 loops=1)
-> Hash (cost=10.76..10.76 rows=8 width=12) (actual time=0.066..0.066 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Bitmap Heap Scan on test2 t2 (cost=4.33..10.76 rows=8 width=12) (actual time=0.044..0.057 rows=8 loops=1)
Recheck Cond: ((value >= 0.5::double precision) AND (value < = 0.51::double precision))

Index Cond: ((value >= 0.5::double precision) AND (value < = 0.51::double precision))
Total runtime: 156.335 ms


If, for example, to set the connection type and MergeJoin IndexScan on the index test2_value_idx, the glider, again add needed sort and Limit.
the
/*+ MergeJoin(t1 t2) IndexScan (t2 test2_value_idx) */ EXPLAIN ANALYZE SELECT * FROM test1 t1 JOIN test2 t2 ON t1.id_2 = t2.id WHERE t2.value BETWEEN 0.5 AND 0.51 ORDER BY t1.value1 LIMIT 100;


the
 the QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=54410.09 54410.34..rows=100 width=36) (actual time=..446.031 446.041 rows=100 loops=1)
-> Sort (cost=54410.09 54430.07..rows=7992 width=36) (actual time=..446.029 446.032 rows=100 loops=1)
Sort Key: t1.value1
Sort Method: top-N heapsort Memory: 32kB
-> Merge Join (cost=54104.65..71.79 rows=7992 width=36) (actual time=..rows 12.501 444.501=7889 loops=1)
Merge Cond: (t1.id_2 = t2.id)
-> Index Scan using test1_id_2_idx on test1 t1 (cost=0.00..51460.24 rows=1000000 width=24) (actual time=0.033..377.392 rows=900401 loops=1)
-> Sort (cost=24.52..24.54 rows=8 width=12) (actual time=0.074..0.545 rows=6927 loops=1)
Sort Key: t2.id
Sort Method: quicksort Memory: 25kB
-> Index Scan using test2_value_idx on test2 t2 (cost=0.00..24.40 rows=8 width=12) (actual time=0.026..0.047 rows=8 loops=1)
Index Cond: ((value >= 0.5::double precision) AND (value < = 0.51::double precision))
Total runtime: 446.182 ms


You may notice that in all these examples, the use of hint, the situation only worsened. I wanted to hint at something that is two times to think before to use hints in real projects. Even if You have a plan, which in this case is faster, ask yourself the following questions:
the
    the
  • if You Set up the parameters of the airframe *_cost, effective_cache_size, geqo* etc. in accordance with available server resources?
  • the
  • what data You have got plan which is faster? In production the same distribution of data? Are you ready to rewrite the hints file when the distribution of data will change?
  • the Plan was executed faster when all you need was in the cache? And at the time of this query in production too will be in the cache?


Still, hints are very useful at least in two situations:
the
    the
  • Want a deeper understanding of the operation of the glider/executer'and to get answers to the questions "What if?".
  • the
  • Sometimes the glider is still very wrong. For example, when there is a strong correlation between table fields, which he is not able to take into account. Because of this, it turns out wrong assessment of selectivity of conditions, and can be a bad plan.

P. S. For the tip on the module thanks to Oleg Bartunov (aka zen)!
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Integration of PostgreSQL with MS SQL Server for those who want faster and deeper

Custom database queries in MODx Revolution

Parse URL in Zend Framework 2