Skip to main content

69 docs tagged with "Docs"

View All Tags

ANSI Compatibility

ByConity provides a rich set of SQL syntax through ANSI SQL dialect. When using this dialect, SQL statements will be parsed and validated by Apache Calcite and then sent to servers for execution. Apache Calcite supports standard ANSI SQL, please refer to the BNF-grammar here for more details//calcite.apache.org/docs/reference.html.

Background and Technical Architecture

ByConity is an open source Data Warehouse system designed for modern IT architecture changes and, it is designed with a Cloud Native architecture. It provides excellent query and write performance while meeting the needs of Data Warehouse users for resource elastic volume expansion and contraction, read and write separation, resource isolation, and strong data consistency.

Basic Database Operations

There are a few ways to get started with ByConity. You have the choice to deploy ByConity through package deployment, using docker wrapper or deploy ByConity in Kubernetes. To get started quickly, we recommend that you use the ByConity Playground with docker-compose/the docker wrapper.

Bucket table best practice manual

In ByConity, when using a Bucket table, the system organizes table data based on one or more columns and expressions specified by the user in the table creation statement. The data with the same value is clustered together and assigned to the same bucket number.

Byconity 0.2.0 s3 storage upgrade checklist

There are some s3 object key and s3 metadata changes after s3's preview version(from pre 0.2.0 version to 0.2.0 version). And we provide some tools to migrate from old version. Only use this if you are using old version of byconity and store data in s3.

Column Storage Design Principles

Typically, transactional databases use row storage to support transactions and high concurrent reading and writing, while analytical databases use column storage to reduce IO and facilitate compression. ByConity, on the other hand, uses column storage to ensure read and write performance, support transaction consistency, and is well-suited for large-scale data calculations.

Data Types

The data types provided in ByConity are adapted from ClickHouse. Visit this page for more information on ClickHouse data types.

Deployment Requirements

ByConity can run on most mainstream commercial servers. We recommend that the deployment of ByConity can comply with the following requirements:

FoundationDB Installation

In this guideline, I will set up a Foundation DB cluster on 3 physical machines. They are all using debian OS. I refer to two official guidelines here Getting Started on Linux and Building a Cluster.

Functions

ByConity provides two SQL dialects, (1) ClickHouse and (2) ANSI.

Git WorkFlow

ByConity is leverage the Github doing the developement. Each contributor and maintainer in ByConity must follow this workflow:

HDFS Installation

In this guide I will set up HDFS on 3 machine, 1 machine is for name node and other 2 machines is for data nodes. I refer to the following official document SingleCluster and ClusterSetup. I will install HDFS version 3.3.4 so I need java-8 because this is the recommended java version for this Hadoop

Hive External Catalog

Besides creating tables in CnchHive engine to access external hive tablesl, Byconity also supports visit the external tables using external catalog.

Hive External Table

CnchHive is a table engine provided by ByConity, which supports federated query in the form of external tables, and users can directly accelerate data query without importing data. CnchHive supports querying data on both HDFS and S3 Hive table.

Main Principles Concepts

This chapter will introduce the main principles of ByConity and it's query execution. ByConity's query execution process is shown in the figure below. First, ByConity will obtain the Metadata information required for the query through the Metadata service. Then ByConity will generate an efficient query plan through the optimizer according to the user's SQL, and schedule it to the corresponding calculation group to read the data and execute it. Finally, the result set is summarized and sent back to the Client.

Package Deployment

One way to deploy ByConity to physical machines is using package manager.

Query Acceleration

Preload feature will load data from remote to local disk cache to speed up the coming up queries. After preload is finished, the query will read data from the local disk, rather than the remote storage.

Query Optimizer

The optimizer is the core of the database system. An excellent optimizer can greatly improve query performance, especially in complex query scenarios. The optimizer can bring performance improvements of several to hundreds of times.

Recommended Use Cases

ByConity uses a large number of mature OLAP technologies, such as column storage engine, MPP execution, intelligent query optimization, vectorized execution, Codegen, indexing, data compression, mainly used in OLAP query and computing scenarios. It has very good performance in real-time data access, aggregation query of large and wide tables, complex analysis and calculation under massive data, and multi-table associated query scenarios.

resource manager

The Resource Manager (RM) component is used for unified management and scheduling of ByConity computing resources, and is the core component to achieve resource elasticity and improve resource utilization.

Role-based Access Control (RBAC)

RBAC in Byconity is adapted from the ClickHouse version of RBAC in most aspects other than minor syntax differences and the underlying implementation which will be explained further below.

SQL Statements

The supported statements in ByConity are similar to ClickHouse, but it is still recommended to follow the ByConity manual to ensure proper use. Some of the examples below are referenced from ClickHouse Documentation but have been adapted and modified to work in ByConity.

Window

ByConity supports the standard syntax of window functions. A list of window related features are explained below.