Skip to main content
Version: 0.3.x

Complex Query Tuning

Complex Query Execution Model

Analytical queries can be divided into simple queries and complex queries. Simple queries typically involve single-table retrieval aggregation or joins between large and small tables, resulting in fast query responses. Complex queries, on the other hand, involve large tables, multiple table joins, and complex logical processing, often leading to slow query responses and high resource consumption. ByConity has optimized its design for complex queries.

Simple queries can adopt a two-stage execution model, with the partial stage executed on the compute nodes and the final stage executed on the service nodes. However, when it comes to executing complex queries involving multiple aggregations or joins, the flexibility of the two-stage execution model is significantly reduced, making query optimization challenging. To better support distributed queries and facilitate the execution plans generated by the optimizer, we have introduced a complex query execution model that supports multiple rounds of distributed execution.

Diagram of Complex Query Execution Model

The execution flow of complex queries is as follows:

  1. The Query SQL String is parsed into an AST (Abstract Syntax Tree) by the parser.
  2. The AST undergoes rewriting and optimization to generate an execution plan.
  3. When the optimizer is enabled, it generates the execution plan.
  4. The execution plan is divided into multiple PlanSegments.
  5. A PlanSegment represents an execution fragment in the distributed execution process, which includes:
    • The required AST fragment or a logical execution plan fragment composed of PlanNodes.
    • Node information for PlanSegment execution.
    • Upstream and downstream information for the PlanSegment, including the input stream from upstream and the required input stream for downstream.
  6. The engine's scheduler constructs a DAG (Directed Acyclic Graph) from these PlanSegments and topologically sorts them before distributing them to all nodes in the cluster.
  7. Each node receives its PlanSegment and begins executing it.
  8. PlanSegments containing data sources start reading the data and distributing it to downstream nodes according to shuffle rules.
  9. PlanSegments with exchange inputs wait for upstream data and continue shuffling the data to various nodes if necessary.
  10. After multiple stages are completed, the results are returned to the server.

How to Enable

Enabling the optimizer automatically triggers the complex query execution model. This can be done by configuring enable_optimizer=1 or setting dialect_type='ANSI'.

  • Ensuring the Existence of Statistics

Without statistics, the generated query plan may not be optimal. You can use show stats [<db_name>.]<table_name> to check the statistics.

  • Analyzing the Execution Time of Each Step

Using explain analyze sql can display the execution time of each step.

  • Tuning Exchange Parameters

The Exchange operator is responsible for data transmission between PlanSegments.

exchange_source_pipeline_threads affects the total number of threads for pipeline execution, especially for non-source pipelines (those that read data directly from storage). The default setting is the number of CPU cores. There is no ideal recommended value, but considering halving or doubling the value and observing the results can be helpful. If the query has a high memory footprint, reducing this value may be beneficial.

exchange_timeout_ms is the Exchange timeout duration in milliseconds, with a default value of 100 seconds. If Exchange-related timeout errors occur, increasing this value may be appropriate.

exchange_unordered_output_parallel_size affects the concurrent ability of Exchange to output data, with a default value of 8. Generally, there is no need to adjust this value. However, if exchange_source_pipeline_threads is adjusted significantly, it may be beneficial to increase exchange_unordered_output_parallel_size to enhance upstream output capability.

exchange_enable_block_compress determines whether to enable Exchange compression, which is enabled by default. If the CPU bottleneck is more significant than the network bottleneck, disabling this option may be worth considering.