How to write the curves of the queries with sub-optimal plan and make you think DBMS

the
Introduction
Talk about a few obvious things, which are quite well documented in man pages and documentation, which is read, usually after joining a dozen rakes, or after shooting a certain number of limbs or mutilation after another.
Parts will be somewhat logically they are poorly connected with each other, but they all met in the course of solving various business needs and somehow satisfy his needs.
the
work WITH
That is the thing WITH that is very similar to syntactic sugar without much meaning and is associated (the uninitiated) with a large footcloths split into separate methods in the spirit of Martin Fowler. Here the main feature is that it nifiga not an analogue of the method/function, especially when it comes to query optimization.
First of all apologize to the reader, but the text will only be pieces of queries, which is of fundamental importance, the full query will not be posted. First, in order not to bore the characteristics of the data structure and secondly, I inadvertently posted something private-enterprise. If the pieces will absolutely not readable, please don't hurt much, and will suggest how to Refine it. Thank you.
How to do.
the Original piece of sql from the main body of the request
LEFT JOIN specifications_history AS specification_history
ON specification_history.id = specification_detail.entity_history_id
AND specification_history.specification_id = ANY(specification_parts.ids)
LEFT JOIN specification_revision_details AS specification_section_detail
ON specification_section_detail.specification_revision_id = specification_revision.id
AND specification_section_detail.entity_type = 1002
LEFT JOIN specification_sections_history AS specification_section_history
ON specification_section_history.id = specification_section_detail.entity_history_id
LEFT JOIN specification_revision_details AS section_item_detail
ON section_item_detail.specification_revision_id = specification_revision.id
AND section_item_detail.entity_type = 1003
LEFT JOIN section_items_history AS section_item_history
ON section_item_history.id = section_item_detail.entity_history_id
'Refined' piece of the query
WITH revision_products AS (
SELECT DISTINCT specification_revision.id AS revision_id,
specification_history.specification_id AS specification_id,
section_item_history.product_id AS product_id
FROM specification_revisions AS specification_revision
INNER JOIN specification_revision_details AS specification_detail
ON specification_detail.specification_revision_id = specification_revision.id
AND specification_detail.entity_type = 1001
INNER JOIN specifications_history AS specification_histor
ON specification_history.id = specification_detail.entity_history_id
INNER JOIN specification_revision_details AS specification_section_detail
ON specification_section_detail.specification_revision_id = specification_revision.id
AND specification_section_detail.entity_type = 1002
INNER JOIN specification_sections_history AS specification_section_history
ON specification_section_history.id = specification_section_detail.entity_history_id
INNER JOIN specification_revision_details AS section_item_detail
ON section_item_detail.specification_revision_id = specification_revision.id
AND section_item_detail.entity_type = 1003
INNER JOIN section_items_history AS section_item_history
ON section_item_history.id = section_item_detail.entity_history_id
WHERE section_item_history.the product_id IS NOT NULL
)
There was the following: from the main request body, which had a lot of LEFT JOIN moved in WITH and turned into an INNER JOIN. The piece was given the euphonious name, in order thus to improve the readability of the main body, and all the details of the implementation dragged away. Practice clean code at its best. With readability, really, was better. Basically the request body is left dainow 5 instead of 10. That's just the speed of the query execution immediately fell 75мс to 95сек. To Explain there was interesting things:
the
-> Unique (cost=796821.66 848031.33..rows=5120967 width=12) (actual time=..rows 80769.666 94946.622=315260 loops=1)
-> Sort (cost=796821.66 809624.07..rows=5120967 width=12) (actual time=..rows 80769.663 90662.993=37659600 loops=1)
Sort Key: specification_revision_1.id specification_history.specification_id, section_item_history.product_id
That is, someone took a 37 mil lines and started them briskly to sort 1 GB of memory. Then came the questions:
the
"where do we 37кк lines, when most of the tables 1.5 KK?"
"we have not changed the algorithm, we only code read done, why it was hanging?"
"it is a declarative, we said that we wanted, but they did not say why everything is broken?"
Response: transfer jonov from the main body at the WITH did exactly what is described in documentation:
WITH Queries (Common Table Expressions)
A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary subquery. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)
Briefly and roughly: of WITH run once and often not optimized, that is, the place of their use does not affect the execution plan.
We have levelable piece of the request to an independent part, forgetting to add the important conditions from the WHERE clause, which trims the sample for most tomatoes. The result perepolnilo the entire base, and then gave this monster in the main body, which took him a dozen lines.
In the above specific case, WHERE there was a condition of the form "product_id = 1234", which asked the main limitation according. If this condition is dragged in WITH, everything would continue to work at roughly the same rate. However, it can only be done in the case of static values for the right side of the condition. If adesnik obtained, for example, during a recursive query, WITH a condition not to drag and the idea of splitting the query into pieces will be as slow.
Conclusion:
the
-
the
- you need to read the documentation; the
- not all development practices are equally useful in different areas of the development.
the
Visualization explain
I think everybody knows about explain.depesz.com. It is beautiful, show what is wrong with the query.

In fact, it's just the default coloring for explain command, but it is very clear and especially helpful in the beginning, I don't know what to look for... but what a lie, and not at the beginning helps, just nice and comfortable.
Here I would like to say a few words on each of the columns and explain how they affect the execution result. Yes, it is written in the same the help, but hardly anyone reads help, until you feel it.
the
-
the
- # is just the sequence number of the operation during the query execution the
- Exclusive — the time to perform a specific operation (in milliseconds) the
- Inclusive — time execution of a command pipeline (e.g. in the picture above for the execution of uniq have to do at least sorting) the
- Rows X — how many missed
Akellascheduler when vangoval the number of rows that should return the operation (Yes, it is important for the subsequent decision-making about how to continue to carry out the request)
the
Tips for beginners optimizers
If all the retards and don't know where to start, here are a couple of tips. You need to take a colorized explain (preferably with analyze) of the preceding paragraph and look at it. Most often, the problem (read, 80%+ run-time), centered in one of the operations described in the execution plan. That is Exclusive/Inclusive to find the dark and dumb place. Again, the example above shows that the operation uniq lasts 94 seconds total of 95 seconds for which the query is executed. Ibid see that uniq almost all the time is sort, which is 90 seconds. Here you can see the problem in the form of number strings, sorting algorithm and used memory. All you got to understand "who is guilty and what to do." Here are only helps knowledge of the data structure of the target database and the requirements for query results. It may be sufficient to rearrange a couple of lines or add extra condition and may need to completely rewrite the query in its original form the only thing he can get is to slow down.
Just pay attention to the large "Rows X". It says it misses, the predicted and the actual result of the sample and, most often, due to insufficient amount of statistics about the tables. This can lead to suboptimal query execution plan. For example: I want to select one row from a table with 1 million elements; if the scheduler decides that the sampling result is not 1 line, and ~200, 000, then he will look at the index and go the full scan, as this is the optimal strategy for this ratio, the resulting rows and table size. Insights about the speed of do it yourself.
the
Standard rake
That's what most often met in practice and that was the reason for the indecent behavior of queries:
the
-
the
- lack of understanding of data structures and join data through an unknown and circuitous detours, or, even better, join unnecessary data; an extreme such case was in MySQL; here's a bit of a lightweight example that conveys the essence of the problem:
theSELECT ordered_products.* FROM products, products AS ordered_products GROUP BY ordered_products.id
On the one hand, just pointed out an extra table FROM inside, and it does not have used. On the other hand, we got an implicit join of two tables using a CROSS JOIN and the resulting rows (useful at least for MySQL 5.5). In my case the products table was 40K lines, but the end of the query I did not wait. As far as I know, Oracle is able to do join elimination, but, in any case, it is better not to rely on features of DBMS and to think a head.
Bonus: how to do this in ActiveRecord and all hangProduct.joins(", (#{Product.table_name}) AS ordered_products"). select('ordered_products.*'). group("ordered_products.#{Product.primary_key}")
the - Love for OUTER JOIN. They generate at least geometric growth lines in intermediate results and can easily be that for some input data the query will slow down, and the DBMS to choke from the amount of data. An extreme example was the request mentioned above in the test (which WITH). He worked with a strict constraint on the product_id. The same query worked fine with an array of 5-15 aydishnikov and the query execution time increased linearly, but then, every next aydishnik in the array increased the time the query 2-3 times. The problem was just to many OUTER JOIN, which multiplicatively increases the number of affected rows and at some point their number was outrageously great, and the execution plan it was impossible to show minor developers.
the - In continuation of the previous point: some people like to put a FULL OUTER JOIN instead of LEFT/RIGHT, which is enough in most of the cases (tested on inhabitants of Habra, which discussed the request from the previous articles interviews). The problem is the same: the generation of redundant data and an increase in the consumption of resources. Personal: FULL OUTER JOIN recently really needed in production for the first time in 2 years... was happy as a child.
the - Wonderful magic functions, for example, from PostgreSQL, when instead of a pair dainow in a declarative style trying to do the same thing, but in an imperative style, using arrays and other data structures coupled with functions, their transforms. Example, unfortunately, not find it, so you have to believe in the word. I only remember things intermittently flashed on stackoverflow. The only good news is that they almost never emerge the leaders by likes.
the
the End
Thank you all. If you have your own great examples of how to do, please do not be silent, "bad advice" and "living examples" not much happens. And the article is, as the ticket in the exam, just an excuse to talk.
Комментарии
Отправить комментарий