Version: 1.0.x

Frequently Asked Questions

Q1: After deploying and running using K8s kind, I encounter an error when inserting data: Code: 910. DB::Exception: Received from 127.0.0.1:9000. DB::Exception: No local available worker group for vw_write SQLSTATE: HY000. A: Troubleshooting steps: 1. Check if all ByConity components are in a ready state. 2. Check if there are any workers in the system.workers table.

Q2: I pulled the ByConity-deploy branch and deployed a demo locally using kind. The ByConity-tso-0 container is stuck in Pending status, while ByConity-daemon-manager and ByConity-server are constantly in CrashLoopBackOff. A: You can use kubectl get pvc -n bycontiy to check the usage. If the usage is too high, you can try explicitly adjusting some storage configurations, such as tso, as described in the "Update your ByConity cluster" section of https://ByConity.github.io/docs/deployment/deploy-k8s.

Q3: Running docker run -it ByConity/ByConity-server:stable client on my Mac with an M2 chip results in a Segmentation fault. A: There are known issues with fdb running on Mac M2 chips within Docker. Based on experience with open-source ClickHouse, ByConity generates images with the jemalloc parameter set to 0 during the compilation of ClickHouse. (Further testing and image updates are needed.)

Import and Export

Q1: When using the Kafka engine to import data, do I need to delete and recreate the materialized view if I want to add new fields? Or can I just modify the query? If the query is *, do I still need to modify it? A: You can execute alter operations on both the base table and the Kafka table, but for VIEW tables, you would need to recreate them.

Q2: Where can I download the PartWriter tool provided by ByConity? A: The tool is available in the main branch of the code repository. (Documentation needs to be improved.)

Designing Databases and Tables

Q1: What is the difference between MergeTree and CnchMergeTree? A: Currently, ByConity only supports CnchMergeTree. (Documentation on database engines and table engines needs to be supplemented.)

Q2: Does ByConity support engines similar to Distributed and ReplicatedMergeTree in ClickHouse? I noticed that the system.zookeeper table does not exist. Does this mean that CnchMergeTree tables automatically distribute data across multiple nodes, and querying a CnchMergeTree table is akin to querying a logically global table rather than just the data on the current machine? A: In cnch, the traditional concepts of replicas and shards are no longer relevant.

Q3: Can the MergeTree engine from ClickHouse be used with ByConity? A: It is generally not used and is only employed for storing some local logs, such as system.query_log.

Q4: Can I use SQL to check the storage size occupied by each table in the database and the number of partitions in a certain table?

A: To check the storage size and partition count of a specific partition:

select sum(bytes_on_disk), sum(rows_count), count(1) from system.cnch_parts where database = 'xxx' and table = 'xxx' and partition_id = 'xxx' and visible

To check the number of partitions in a certain table:

select count(1) from system.cnch_parts where database = 'xxx' and table = 'xxx'  and visible

Q1: Summary of commonly used SQL statements in ByConity. A: Community-provided summary of commonly used SQL statements: https://blog.csdn.net/cheng1483/article/details/132458760?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22132458760%22%2C%22source%22%3A%22cheng1483%22%7D&fromshare=blogdetail

Q2: Are there many differences between ByConity's query syntax and MySQL? A: (Answer not provided in the original text. It needs to be supplemented based on actual differences.)

Cluster Management

Q1: To scale the number of nodes in ByConity, can I simply adjust the replica count for statefulset.apps/ByConity-vw-vw-default and statefulset.apps/ByConity-vw-vw-write? Are there any additional configurations required? Does ByConity support lossless scaling, and can it be configured through hpa? A: You can directly adjust the replica count, and ByConity supports lossless scaling. For more details, refer to: https://ByConity.github.io/zh-cn/docs/deployment/deploy-k8s. HPA configuration is also supported.

Q2: ByConity supports multi-tenant resource control (virtual warehouse). How are users associated with virtual warehouses? Are there any user-based limitations? Or is it possible to limit the memory consumption of individual requests like in ClickHouse? (Documentation needs to be supplemented.) A: Users can be associated with virtual warehouses by specifying them during table creation or at query time. For more information, refer to: https://ByConity.github.io/zh-cn/docs/basic-guide/virtual-warehouse-configuration. Currently, user-based limitations are only available at the server level and not at the virtual warehouse level.

Q3: What is the difference between ByConity-worker-write and ByConity-worker? Are they installed on different nodes? A: ByConity-worker-write is specifically designed for write operations and is placed in the write computation group, while ByConity-worker is generally used for read operations.

Q4: Does ByConity have control over the usage of local disk space for write nodes? A: The size of the disk cache can be controlled through the disk_cache_strategies.simple.lru_max_size parameter. However, if there are many small files, the actual space usage may exceed the configured value. It is recommended to leave some additional space when configuring the disk cache size. We are working on optimizing the storage of small objects to address this issue. See issue 498 for more details: https://github.com/ByConity/ByConity/issues/498#issuecomment-1752426986

Q5: The amount of disk space on an instance has reached its limit, and a node has exceeded the maximum attachment limit for disks, which is 16 disks. A: This indicates that the fdb installation has exceeded the limit for mounted disks.

Q1: After configuring ByConity with s3, how long does it take for data to be written to s3 when performing a write operation? When querying data, does it directly query s3 remotely, or does it first check the local disk for a cache? A: When writing data, it is directly written to S3, and the local disk stores temporary files. During a query, the system first checks the local disk cache. If the data is not found, it then accesses the remote storage.

Q2: When configuring the AWS s3 endpoint, should I use https://xxx or s3://xxxx? To access an s3 bucket, do I need to combine the endpoint, bucket, and path? A: Refer to https://byconity.github.io/docs/admin-guides/cluster-configuration#storage_configuration. Yes, you would combine the endpoint, bucket, and path.

Q3: In the S3 configuration, what are ak_id and ak_secret? What is the type? What is the purpose of virtual_host_style? A: ak_id and ak_secret correspond to the access_key and secret_key of s3, respectively. The type should be set to bytes3. The virtual_host_style can be set to true. Most public clouds use the virtual host request style, while minio defaults to the PathStyle unless otherwise configured.

Q4: Which HDFS C++ client does ByConity use? A: ByConity uses the HDFS C++ client from https://github.com/ByConity/libhdfs3-open.git.

Q5: How is the size of ByConity's disk cache determined? A: The size of the disk cache can be configured through the disk_cache_strategies parameter. Specifically, there is a maximum setting called lru_max_size. For more information and an example configuration, refer to https://ByConity.github.io/zh-cn/docs/basic-guide/cluster-configuration#disk_cache_strategies.

Q6: Are there any changes to the underlying data storage structure of ByConity compared to ClickHouse? How is data consistency between the local cache and remote storage ensured? A: ByConity stores data parts belonging to the same partition in separate files in HDFS/S3. Once written, these files are immutable, and any modifications to the data are achieved by adding delta parts. Therefore, there is no consistency issue at the part level between the local cache and remote storage. The only difference is whether the data is cached or not.

Performance Testing

Q1: Is there a comparative analysis of ByConity's Join performance with ClickHouse? A: You can refer to the "Performance Testing" section in the documentation

Other

Q1: Which version of ClickHouse is ByConity based on? Does it have the same basic features as ClickHouse? A: 21.8. Reference: https://github.com/ByConity/ByConity/pull/347

Q2: How to use the CLI client of ByConity? Is there any documentation? (Kevin feedback: the documentation is outdated and needs to be updated) A: You can use the following command: docker run -it ByConity/ByConity-server:stable client --host {} --port {}

Q: Enabling the optimizer will automatically use the complex query execution model. It can be enabled by configuring enable_optimizer=1 or dialect_type='ANSI'. Is this done manually or can it be written in a configuration file? (Please update the configuration documentation to explain each configuration) A: You can manually modify the users.xml file, add these configurations, and then restart ByConity.

Q: Does ByConity support secondary indexes? If so, are the secondary index files packaged into HDFS? A: Yes, ByConity supports secondary indexes.

Q: Does ByConity support ClickHouse dictionaries? Currently, only the MySQL table engine is mentioned in the documentation at https://ByConity.github.io/docs/basic-guide/data-import A: Yes, ByConity supports dictionaries. Please refer to the CNCH User guide for more information.

Q: Does ByConity have a parameter similar to max_bytes_to_merge_at_min_space_in_pool in ClickHouse to set the minimum size of parts? A: Yes, ByConity has parameters cnch_merge_max_total_rows_to_merge to limit the number of rows and max_bytes_to_merge_at_max_space_in_pool to limit the bytes.

Q: Does ByConity support projections? A: Yes, ByConity supports projections for both HDFS and S3.

Q: How is the sharding_key defined? I see that ByConity has a parameter called optimize_skip_unused_shards which defaults to false. If set to true, under what circumstances could there be risks or issues? From my understanding of this parameter, it seems like setting it to true in the code logic would be better. Why is this configuration exposed? A:

Q: The parameters in cnch_config.xml are not clearly explained. A:

Frequently Asked Questions

Deployment-related​

Import and Export​

Designing Databases and Tables​

SQL-related​

Cluster Management​

Storage-related​

Performance Testing​

Other​

Deployment-related

Import and Export

Designing Databases and Tables

SQL-related

Cluster Management

Storage-related

Performance Testing

Other