Counting the number of found records in PostgreSQL

to work in a new project uses PostgreSQL. Because so far I have worked with MySQL, now have to study and discover Postgre. The first challenge that interested me was the replacement muskulistogo SQL_CALC_FOUND_ROWS. When you use this constant in MySQL, you can obtain the number of all the query finds entries even if the query limit'ω is indispensable in a paged output of search results when using "heavy" queries.
Immediately ready-made solutions could not be found. The forums just stated that the SQL_CALC_FOUND_ROWS in Postgre no. Some wrote that it is necessary to use count(*). And no more information. But also from MySQL I know that the search with a count () query is running almost 2 times slower than with SQL_CALC_FOUND_ROWS. I consulted those who use PostgreSQL, a day plagued google and the result received 4 replacement option SQL_CALC_FOUND_ROWS in PostgreSQL, one of which is quite comprehensible speed.

So, just imagine those four options, which will be discussed. Our target query checks the table entries that contain the text of the adf in the field `text`. Select id 20 records starting from $ 180,000 in order and found the number.
the Option 1. Taken from phpPgAdmin. I just looked at the code of this client for PostgreSQL and check the calculation they made when viewing the table data. Use 2 queries with subqueries. Comfort in the fact that it is not necessary to parse and change the original query to count the number of records found.
select count(id) from (select id from testing where text like '%adf%') as sub; 
select * from (select id from testing where text like '%adf%') as sub limit 20  offset  180000

the Option 2. The easiest option, which is usually yuzayut newcomers like MySQL and Postgres and other DBMS. 2 the request.
select count(id) from testing where text like '%adf%'; 
select id from testing where text like '%adf%' limit 20 offset 180000

the Option 3. max_posedon. This is an attempt emulation muskulistogo SQL_CALC_FOUND_ROWS in Postgres on logic. Really only works when sorting by id (in this case). Here substitutes the id of the last record in the sample, i.e. the record number of 180,000 + 20.
select id from testing where text like '%adf%' limit 20 offset 180000; 
select count(id) from testing where text like '%adf%' and id > 132629;

the Option 4. For user tips irc.freenode.org again max_posedon‘s and this answer on the PostgreSQL forum, which was buried deep in Google. Cursor is used.
DECLARE curs CURSOR FOR select id from testing where text like '%adf%'; 
MOVE FORWARD 180000 IN curs; 
20 FETCH FROM curs; 
MOVE FORWARD ALL IN curs;

+ function PQcmdTuples() API Postgres (or $count = pg_cmdtuples($result); in PHP).
the Please note that all 4 possible queries should be run in a single transaction, then they are faster. 4th option doesn't work at all, if you do not use a single transaction: lost the cursor.
the Now, about speeds. I have tested the speed of these four options. In General, the tests confirmed the expectations. But I note an important fact. All queries were run on PostgreSQL configuration by default, which is not optimized for performance. I just wasn't optimized servers. So the numbers may be slightly adjusted when you start with a “good” config. However, the essence will not change.
Test runs were carried out in PHP for 20 repetitions, 2 times for each option. Available php script, which ran the tests. Who cares, there are full statistics samples in Excel. Here will only publish a PivotTable:
the the the
var 1 var 2 var 3 var 4
Wed. time (MS) 647,41 648,25 450,64 370,67
relative to var 4 1,75 1,75 1,22

For comparison, the time of queries without using transactions:
the

    var 1: 1204 MS the

  • option 2: 689 MS,
  • the
  • var 3: 560 MS
  • the
  • 4 var works only within a transaction.

the Results. The fastest option 4 using the cursor. Its speed is due to the fact that “heavy”, the search query is executed only once. Further operations with the cursor. Similarly SQL_CALC_FOUND_ROWS works in MySQL. 20% of it is behind the option 3 — attempt emulation SQL_CALC_FOUND_ROWS in PostgreSQL. Options 1 and 2 operate at about the same speed and at 75% (more than 2/3!) inferior to the speed of the cursor query.
the PS for pg-guru. First, if you have this information will seem obvious, believe me — for new users of PostgreSQL it's not obvious, and to find this information is not easy. Second, looking forward to your comments about tests on tuning the performance of the configuration postgre or about your experience, and about other versions of the calculation.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Integration of PostgreSQL with MS SQL Server for those who want faster and deeper

Parse URL in Zend Framework 2

Custom database queries in MODx Revolution