Version: 0.2.0

Complex query model and execution tuning

Complex query execution model

Complex query execution model diagram

Analytical queries can be classified into simple and complex queries. Simple queries typically involve retrieving and aggregating data from a single table, as well as joining large and small tables, with fast query response times. Complex queries, on the other hand, involve large tables, multiple table associations, and complex logic processing, often with slow query response times and high resource consumption. ByConity is optimized for complex queries.

For simple queries, a two-stage execution model is used, with the partial stage executed on the computing node and the final stage executed on the service node. However, when executing complex queries involving multiple aggregations or joins, the flexibility of the two-stage execution model is limited, making query optimization challenging. To better support distributed queries and facilitate the execution of the execution plan generated by the optimizer, we have introduced complex queries that support multiple rounds of distributed execution mode.

The execution flow of a complex query is as follows:

Query SQL String is parsed into AST by parser
Rewrite and optimize AST to generate an execution plan
When the optimizer is enabled, an execution plan is generated by the optimizer.
Divide the execution plan into multiple PlanSegments
PlanSegment is an execution segment in the distributed execution process, which contains
The AST fragment required for execution, or a logical execution plan fragment composed of PlanNode
Node information for PlanSegment execution
The upstream and downstream information of PlanSegment, which includes the upstream input flow and the input flow required by the downstream
The engine's scheduler will form these PlanSegments into a DAG, and then distribute them to all nodes in the cluster according to topological sorting
After each node receives the PlanSegment, it starts to drive the execution of the PlanSegment
The PlanSegment containing the data source starts to read the data, and distributes the data to the downstream nodes according to certain shuffle rules
The PlanSegment containing the exchange input waits for upstream data, and if it needs to continue to shuffle, it will continue to send the data to each node
After multiple rounds of stages are completed, the results will be returned to the server

How to enable it

Enabling the optimizer will automatically follow the complex query execution model. It can be enabled by configuring enable_optimizer=1, or dialect_type='ANSI'.

Make sure statistics exist

Without statistics, the resulting query plan is suboptimal. show stats [<db_name>.]<table_name>

Analyze the time spent on each Step of the plan

The time-consuming of each Step can be displayed by explain analyze sql

Exchange parameter tuning

Exchange operators are responsible for data transfer between PlanSegments.

exchange_source_pipeline_threads affects the total number of threads executed by the pipeline, especially for non-source pipelines (referring to pipelines that directly read data from storage). Currently, the default setting is the number of CPU cores. There is currently no ideal recommended value. You can consider /2 or \ *2 Observation results. If the query memory usage is large, it can be adjusted down.

exchange_timeout_ms Exchange timeout time (ms), the default is 100s, if there is an Exchange-related timeout error, you can increase it appropriately

exchange_unordered_output_parallel_size affects the concurrency of exchange output data, the default is 8, and generally does not need to be adjusted. If the adjustment of exchange_source_pipeline_threads is relatively large, you can appropriately increase exchange_unordered_output_parallel_size to increase the upstream output capacity.

exchange_enable_block_compress Whether to enable exchange compression, it is enabled by default, if the CPU bottleneck is more obvious than the network, you can try to turn it off

Complex query model and execution tuning

Complex query execution model​

How to enable it​

Reference Documentation​

Complex query execution model

How to enable it

Reference Documentation