A Parallel Sequential Scan Commited
In previous articles I wrote about getting Sikamikanico parallel sequential scan in PostgreSQL 9.5. But this did not happen. However, I will gladly tell you about the first commit of the parallel sequential scan in PostgreSQL master branch with the goal of the upcoming PostgreSQL release 9.6.
Parallel queries for PostgreSQL has been a long time my dream, I worked for several years. Their development began c PostgreSQL 9.4 release cycle, where I added dynamic worker running in the background and dynamic shared memory, the development of which continued in PostgreSQL 9.5, where there has been the idea of adding fundamental infrastructure for concurrency which was sakimichan. I would like to tell you a bit about the current commit and the work to be done in the future.
But the first thing I would do is to thank those I should. First, Amit Kapila, who made a huge contribution to the completion of the project. We both wrote a lot of code that became a part of this functionality. And part of that code has gone through many commits in the last few years. We also wrote a lot of code that was not included in the commits. Secondly, I want to say thanks to Noah Misch who helped me in the early stages of this project when I was struggling with problems in search of solutions. Thirdly, I would like to thank the PostgreSQL community, and individual people who supported the review and test patches, the suggested improvements and many others who supported us in different ways.
It is also important to say thank you EnterpriseDB, in particular, its leadership. First, Tom Kincaid and Marc Linster. Without their support it would not be possible with us Ammit's to dedicate much time to this project, and without my team at EnterpriseDB, who patiently covered for me whenever it was necessary to solve other business issues. Thank you all.
Now, time for a demo:
the
the
Here's the plan:
the
An accumulating node collects all sorcery, and all sorcery run in additional terms in parallel. Because an additional plan Parallel Seq Scan Seq Scan ordinary. Worker koordiniruyutsya with each other so that each block in the relation is scanned only once. Each worker can produce a subset of the final result set, and the collecting node collects the results from all.
One enough more a limitation of the current implementation is that we generated by the collector nodes on top of the Parallel Seq Scan nodes. This means that this function currently does not work for the inheritance hierarchy (using the split protezirovanija tables) because you can add it between the nodes. It is also not possible to push the join down into sorcery now. The contractor infrastructure is supporting the launch plans of each type, but the current scheduler too stupid to support this. I hope to fix this problem before the end of the release cycle 9.6. Given the current situation the use of this feature will give the advantage where adding an index will not help and adding a few workarou will help to increase the execution speed.
Also my experience says that adding multiple workarou usually helps, and the advantage of bad massturbate with a large number of workerb. Just need deeper study to understand why this is happening and how to improve them... As you can see even 5 workarou can improve performance quite a bit, but this is not so important as in the previous restriction. However, I would like to improve them further, as the number of CPU is growing all the time.
In conclusion, I would like to note that there are a number of specific tasks that must be completed before I can call this function even in base form fully completed. Probably what remains of the error. Testing is very much appreciated. Please report problems that you find when testing on pgsql-hackers and postgresql.org. Thank you.
Article based on information from habrahabr.ru
Parallel queries for PostgreSQL has been a long time my dream, I worked for several years. Their development began c PostgreSQL 9.4 release cycle, where I added dynamic worker running in the background and dynamic shared memory, the development of which continued in PostgreSQL 9.5, where there has been the idea of adding fundamental infrastructure for concurrency which was sakimichan. I would like to tell you a bit about the current commit and the work to be done in the future.
But the first thing I would do is to thank those I should. First, Amit Kapila, who made a huge contribution to the completion of the project. We both wrote a lot of code that became a part of this functionality. And part of that code has gone through many commits in the last few years. We also wrote a lot of code that was not included in the commits. Secondly, I want to say thanks to Noah Misch who helped me in the early stages of this project when I was struggling with problems in search of solutions. Thirdly, I would like to thank the PostgreSQL community, and individual people who supported the review and test patches, the suggested improvements and many others who supported us in different ways.
It is also important to say thank you EnterpriseDB, in particular, its leadership. First, Tom Kincaid and Marc Linster. Without their support it would not be possible with us Ammit's to dedicate much time to this project, and without my team at EnterpriseDB, who patiently covered for me whenever it was necessary to solve other business issues. Thank you all.
Now, time for a demo:
the
rhaas=# \timing
Timing is on.
rhaas=# select * from pgbench_accounts where filler like '%a%';
aid | bid | abalance | filler
-----+-----+----------+--------
(0 rows)
Time: 743.061 ms
the
rhaas=# set max_parallel_degree = 4;
SET
Time: 0.270 ms
rhaas=# select * from pgbench_accounts where filler like '%a%';
aid | bid | abalance | filler
-----+-----+----------+--------
(0 rows)
Time: 213.412 ms
Here's the plan:
the
rhaas=# explain (costs off) select * from pgbench_accounts where filler like '%a%';
QUERY PLAN
---------------------------------------------
Gather
Number of Workers: 4
- >Parallel Seq Scan on pgbench_accounts
Filter: (filler ~~ '%a%'::text)
(4 rows)
An accumulating node collects all sorcery, and all sorcery run in additional terms in parallel. Because an additional plan Parallel Seq Scan Seq Scan ordinary. Worker koordiniruyutsya with each other so that each block in the relation is scanned only once. Each worker can produce a subset of the final result set, and the collecting node collects the results from all.
One enough more a limitation of the current implementation is that we generated by the collector nodes on top of the Parallel Seq Scan nodes. This means that this function currently does not work for the inheritance hierarchy (using the split protezirovanija tables) because you can add it between the nodes. It is also not possible to push the join down into sorcery now. The contractor infrastructure is supporting the launch plans of each type, but the current scheduler too stupid to support this. I hope to fix this problem before the end of the release cycle 9.6. Given the current situation the use of this feature will give the advantage where adding an index will not help and adding a few workarou will help to increase the execution speed.
Also my experience says that adding multiple workarou usually helps, and the advantage of bad massturbate with a large number of workerb. Just need deeper study to understand why this is happening and how to improve them... As you can see even 5 workarou can improve performance quite a bit, but this is not so important as in the previous restriction. However, I would like to improve them further, as the number of CPU is growing all the time.
In conclusion, I would like to note that there are a number of specific tasks that must be completed before I can call this function even in base form fully completed. Probably what remains of the error. Testing is very much appreciated. Please report problems that you find when testing on pgsql-hackers and postgresql.org. Thank you.
Комментарии
Отправить комментарий