Overview of the most important features of Postgres 9.3: materialized views

PostgreSQL 9.3 will come out with a pretty cool feature called materialized views. Feature was developed by Kevin Grittner and not so long ago sakimichan:

the commit 3bf3ab8c563699138be02f9dc305b7b77a724307
Date: Sunday March 4, 18:23:31 2013 -0600
Author: Kevin Grittner

Added materialized views

Have a materialized view has a rule, as well as the regular performances, and a lot of, as well as other physical properties like a table. The rule is used only for filling of the table references in the query point to the materialized data.

Implemented minimal functionality, but it can be useful in many cases. Currently, data is loaded only “on demand” CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW. It is expected that future releases will be added incrementally update the data with different settings time updates, and will be given clearer definition to the concept of “fresh” data. At some point, even the queries to use the materialized data instead of the data of the tables themselves, but it requires the implementation of the above-described functionality in the first place.

Most of the work on the documentation did Robert Haas. Review: Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja. Review on security issues, including the decision on how best to implement sepgsql expected from KaiGai Kohei.
What is a materialized view? In short, it is a mutant of the table and normal view. The view is a projection of the data using the specified relationship, without storage. The table is... the table!

The materialized view lies somewhere in the middle is a projection of table data with their own storage. It uses a query to retrieve its data representation, but data is stored as an ordinary table. The materialized view can be updated with fresh data by re-executing the query used at the stage of its creation. In addition, it can be cleaned up (truncated). In the latter case, it remains in a state of non-scanning. Also, since the materialized view has its own full-fledged repository, it can use the tablespace (tablespace) and its own index. Please note that it can be busprotocol (unlogged) (approx. pens.: that is, data are written to write-ahead log).

Along with this feature introduces 4 new SQL command:

CREATE MATERIALIZED VIEW
ALTER MATERIALIZED VIEW
DROP MATERIALIZED VIEW
REFRESH MATERIALIZED VIEW

CREATE, ALTER, and DROP – in this case it is usual DDLcommands for manipulating the view definition. However, the most interesting team REFRESH (about her name was a long debate within the community). This command can be used to update the materialized view with fresh data by re-running the scan request. Please note that the REFRESH can also be used to clean data (truncate), although not present, by running with the option of WITH NO DATA.

Materialized views have many advantages in different situations: quick access to information that should be obtained from a remote server (reading the file on the server postgres through file_fdw, etc.), the use of periodically updated data (caching system), projection data with ORDER BY from large tables, perform periodic, expensive “JOIN”s in the background etc.

I can already imagine some great combinations of procedures, data updates and background workarou. Who ever said that automatically refresh data in a materialized view is impossible?
Now, let's see how it works:

postgres=# CREATE TABLE aa AS SELECT generate_series(1,1000000) AS a;
SELECT 1000000
postgres=# CREATE VIEW aav AS SELECT * FROM aa WHERE a < = 500000;
CREATE VIEW
postgres=# CREATE MATERIALIZED VIEW aam AS SELECT * FROM aa WHERE a < = 500000;
SELECT 500000

The dimensions for each of the relations:

postgres=# SELECT pg_relation_size('aa') AS tab_size, pg_relation_size('aav') AS view_size, pg_relation_size('aam') AS matview_size;
tab_size | view_size | matview_size
----------+-----------+--------------
36249600 | 0 | 18137088
(1 row)

The materialized view uses the store (in this case, 18Мб) to the extent necessary to store data selected from the parent table (size 36Мб) during the execution of the request to create the view.
The update is received the submission is very easy.

postgres=# DELETE FROM aa WHERE a < = 500000;
DELETE 500000
postgres=# SELECT count(*) FROM aam;
count
500000
(1 row)
postgres=# REFRESH MATERIALIZED VIEW aam;
REFRESH MATERIALIZED VIEW
postgres=# SELECT count(*) FROM aam;
count
0
(1 row)

Changes in the parent table is reflected in the materialized view only after REFRESH command. Please note that at the time of this writing, REFRESH used the exclusive lock (eh...).
The materialized view can be switched to the unconnectable state with the option of WITH NO DATA REFRESH command.

postgres=# REFRESH MATERIALIZED VIEW aam WITH NO DATA;
REFRESH MATERIALIZED VIEW
postgres=# SELECT count(*) FROM aam;
ERROR: materialized view aam has not been populated.
HINT: Use the REFRESH MATERIALIZED VIEW command.

There is a new system table matviews, which contains information about the current status of the materialized views.

postgres=# SELECT matviewname, isscannable FROM pg_matviews;
matviewname | isscannable
-------------+-------------
aam | f
(1 row)

Over the materialized view cannot be DMLrequests, since the data representation can not correspond to the current value in the parent table. Normal view, on the contrary, perform corresponding query each time it is necessary, therefore, through them, possible modification of the parent table (updatable views).

postgres=# INSERT INTO aam VALUES (1);
ERROR: cannot change materialized view aam
postgres=# UPDATE the aam SET a = 5;
ERROR: cannot change materialized view aam
postgres=# DELETE FROM aam;
ERROR: cannot change materialized view aam

Now a few words about improvement and deterioration in performance, which you can obtain using materialized views (considering the fact that you can manipulate and their indexes). For example, you can very easily improve the performance of select query in a materialized view, absolutely without worrying about the schema of the data in the parent table:

postgres=# EXPLAIN ANALYZE SELECT * FROM aam WHERE a = 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on aam (cost=0.00..8464.00 rows=1 width=4) (actual time=0.060..155.934 rows=1 loops=1)
Rows Removed by Filter: 499999
Total runtime: 156.047 ms
(4 rows)
postgres=# CREATE INDEX aam_ind ON aam (a);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM aam WHERE a = 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Only Scan using aam_ind on aam (cost=0.42..8.44 rows=1 width=4) (actual time=2.096 2.101..rows=1 loops=1)
Index Cond: (a = 1)
Heap Fetches: 1
Total runtime: 2.196 ms
(4 rows)

Please note that the indexes and constraints (materialized views can have a constraints!) the parent table are copied to materialized view. For example, a quick query scanning the table's primary key may end fatally long brute force of being run on the materialized view.

postgres=# INSERT INTO bb VALUES (generate_series(1,100000));
INSERT 0 100000
postgres=# EXPLAIN ANALYZE SELECT * FROM bb WHERE a = 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using bb_pkey on bb (cost=0.29..8.31 rows=1 width=4) (actual time=0.078..0.080 rows=1 loops=1)
Index Cond: (a = 1)
Heap Fetches: 1
Total runtime: 0.159 ms
(4 rows)
postgres=# CREATE MATERIALIZED VIEW bbm AS SELECT * FROM bb;
SELECT 100000
postgres=# EXPLAIN ANALYZE SELECT * FROM bbm WHERE a = 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on bbm (cost=0.00..1776.00 rows=533 width=4) (actual time=0.144..41.873 rows=1 loops=1)
Filter: (a = 1)
Rows Removed by Filter: 99999
Total runtime: 41.935 ms
(4 rows)

Such anti-patterns certainly not recommended for use on production systems!
In General, a materialized view is a wonderful feature, especially for use in applications that require caching. Enjoy!
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Integration of PostgreSQL with MS SQL Server for those who want faster and deeper

Parse URL in Zend Framework 2

Custom database queries in MODx Revolution