Version: 1.0.x

Deploy on Physical Machines

Currently, the way to install ByConity on physical machines is through package managers, such as installing Debian packages for Debian OS or rpm packages for Centos OS. Since ByConity uses FoundationDB as its metadata storage and HDFS as its data storage, before deploying ByConity, we need to deploy FoundationDB and HDFS first. The steps are to install FoundationDB (FDB) first, then install HDFS, and finally deploy the ByConity software package. Details are as follows.

Installing FoundationDB

In this section, we will set up a FoundationDB cluster on 3 physical machines, all using the Debian operating system. We refer to the official guides: Getting Started on Linux and Building a Cluster.

First, we need to download the binary files from the official download page for installation. If access is slow from within China, we provide a domestic download address. Download the server, monitor, and cli binary files, as well as the corresponding sha256 checksum files (we use version 7.1.25 as an example).

Create a foundationdb/bin folder locally and download the installation files:

curl -L -o fdbserver.x86_64 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbserver.x86_64
curl -L -o fdbserver.x86_64.sha256 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbserver.x86_64.sha256

curl -L -o fdbmonitor.x86_64 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbmonitor.x86_64
curl -L -o fdbmonitor.x86_64.sha256 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbmonitor.x86_64.sha256

curl -L -o fdbcli.x86_64 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbcli.x86_64
curl -L -o fdbcli.x86_64.sha256 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbcli.x86_64.sha256

Domestic download address

https://release-bin.tos-cn-beijing.volces.com/fdb/7.1.25/fdbcli.x86_64
https://release-bin.tos-cn-beijing.volces.com/fdb/7.1.25/fdbmonitor.x86_64
https://release-bin.tos-cn-beijing.volces.com/fdb/7.1.25/fdbserver.x86_64
https://release-bin.tos-cn-beijing.volces.com/fdb/7.1.25/foundationdb-clients-7.1.25-1.el7.x86_64.rpm
https://release-bin.tos-cn-beijing.volces.com/fdb/7.1.25/foundationdb-clients_7.1.25-1_amd64.deb

After the download is complete, check the checksums:

$ sha256sum --binary fdbserver.x86_64
73b70a75464e64fd0a01a7536e110e31c3e6ce793d425aecfc40f0be9f0652b7 *fdbserver.x86_64

$ cat fdbserver.x86_64.sha256
73b70a75464e64fd0a01a7536e110e31c3e6ce793d425aecfc40f0be9f0652b7  fdbserver.x86_64

Rename the executable files, add executable permissions, and delete unnecessary files:

rm *.sha256
mv fdbcli.x86_64 fdbcli
mv fdbmonitor.x86_64 fdbmonitor
mv fdbserver.x86_64 fdbserver
chmod ug+x fdbcli fdbmonitor fdbserver

Create folders to store configurations, data, and logs:

mkdir -p /<your_directory>/fdb_runtime/config
mkdir -p /<your_directory>/fdb_runtime/data
mkdir -p /<your_directory>/fdb_runtime/logs

Create a foundationdb.conf configuration file in the /<your_directory>/fdb_runtime/config/ folder with the following content:

$ cat /<your_directory>/fdb_runtime/config/foundationdb.conf
[fdbmonitor]
user = root

[general]
cluster-file = /<your_directory>/fdb_runtime/config/fdb.cluster
restart-delay = 60

[fdbserver]

command = /<your_directory>/foundationdb/bin/fdbserver
datadir = /<your_directory>/fdb_runtime/data/$ID
logdir = /<your_directory>/fdb_runtime/logs/
public-address = auto:$ID
listen-address = public


[fdbserver.4500]
class=stateless
[fdbserver.4501]
class=transaction
[fdbserver.4502]
class=storage
[fdbserver.4503]
class=stateless

Create a file named fdb.cluster in the same folder with the following content:

$ cat /<your_directory>/fdb_runtime/config/fdb.cluster

# Replace <your_ip_address> with the local IP address
clusterdsc:test@<your_ip_address>:4500

Install FDB as a systemd service. In the same folder, create a file named fdb.service with the following content:

$ cat /<your_directory>/fdb_runtime/config/fdb.service
[Unit]
Description=FoundationDB (KV storage for cnch metastore)

[Service]
Restart=always
RestartSec=30
TimeoutStopSec=600
ExecStart=/<your_directory>/foundationdb/bin/fdbmonitor --conffile /<your_directory>/fdb_runtime/config/foundationdb.conf --lockfile /<your_directory>/fdb_runtime/fdbmonitor.pid

[Install]
WantedBy=multi-user.target

Now that the configuration files are prepared, proceed to install FDB into systemd.

Copy the service file to the /etc/systemd/system/ directory:

cp fdb.service /etc/systemd/system/

Reload the service files to include the new service:

systemctl daemon-reload

Enable and start the service:

systemctl enable fdb.service
systemctl start fdb.service

Check the service and see if it's active:

$ systemctl status fdb.service
  fdb.service - FoundationDB (KV storage for cnch metastore)
  Loaded: loaded (/etc/systemd/system/fdb.service; disabled; vendor preset: enabled)
  Active: active (running) since Tue 2023-01-17 18:35:42 CST; 20s ago
...

Now that the FDB service has been installed on one machine, repeat the same steps to install the FDB service on the other two machines.

After installation, you need to connect the three FDB services to form a cluster. Go back to the first node and use fdbcli to connect to FDB.

$ ./foundationdb/bin/fdbcli -C fdb_runtime/config/fdb.cluster
Using cluster file `fdb_runtime/config/fdb.cluster'.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb>

Execute the following command to initialize the database:

configure new single ssd

Set all three machines as coordinators, replacing the addresses with your machine addresses:

coordinators <node_1_ip_address>:4500 <node_2_ip_address>:4500 <node_3_ip_address>:4500

Then exit fdbcli, and you'll find that the fdb.cluster file now has new content:

$ cat fdb_runtime/config/fdb.cluster
# DO NOT EDIT!
# This file is auto-generated, it is not to be edited by hand
clusterdsc:wwxVEcyLvSiO3BGKxjIw7Sg5d1UTX5ad@example1.host.com:4500,example2.host.com:4500,example3.host.com:4500

Copy this file to the other two machines and replace the old files, then restart the fdb service:

systemctl restart fdb.service

After that, return to the first machine, connect to FDB using fdbcli again, and execute the following command to change the redundancy mode to double:

configure double

Then execute the status command in fdbcli to view the results. You should see something similar to the following:

fdb> status

Using cluster file `fdb_runtime/config/fdb.cluster'.

Configuration:
  Redundancy mode        - double
  Storage engine         - ssd-2
  Coordinators           - 3
  Usable Regions         - 1

This confirms that you have completed the installation of the FoundationDB server. Now you have the fdb.cluster file, which we will use in the Byconity configuration.

Installing HDFS

Here, we will install HDFS on three machines, with one machine acting as the namenode and the other two as datanodes. For detailed instructions, refer to the official documentation: SingleCluster and ClusterSetup. We will install HDFS version 3.3.4 based on Java 8.

First, install Java on all three machines. There are many ways to install Java, but here we will use the following two commands:

sudo apt-get update
sudo apt-get install openjdk-8-jdk

Next, download a Hadoop distribution from the official website or use our provided domestic fast download link. Extract the archive and navigate to the directory:

$ curl -L -o hadoop-3.3.4.tar.gz https://dlcdn.apache.org/hadoop/common/stable/hadoop-3.3.4.tar.gz
$ tar xvf hadoop-3.3.4.tar.gz
$ ls
hadoop-3.3.4  hadoop-3.3.4.tar.gz
$ cd hadoop-3.3.4

Domestic download address

https://release-bin.tos-cn-beijing.volces.com/hdfs/3.3.6/hadoop-3.3.6.tar.gz

Edit the etc/hadoop/hadoop-env.sh file to set the environment variables:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/<your_directory>/hdfs/hadoop-3.3.4
export HADOOP_LOG_DIR=/<your_directory>/hdfs/logs

Edit the etc/hadoop/core-site.xml file with the following content, replacing <your_name_node_ip_address> with the actual IP address of your namenode:

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://<your_name_node_ip_address>:12001</value>
        </property>
</configuration>

Now that the common setup for all three machines is complete, we will proceed with the specific configurations for the namenode and datanodes.

On the namenode, create a file containing the list of datanodes. For example, datanodes_list.txt should look like this:

$ cat /root/user_xyz/hdfs/datanodes_list.txt
<datanode_1_address>
<datanode_2_address>

Create a directory to store the namenode's runtime data:

mkdir -p /<your_directory>/hdfs/root_data_path_for_namenode

Edit the etc/hadoop/hdfs-site.xml file on the namenode with the following content:

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:///<your_directory>/hdfs/root_data_path_for_namenode</value>
        </property>
        <property>
                <name>dfs.hosts</name>
                <value>/<your_directory>/hdfs/datanodes_list.txt</value>
        </property>
</configuration>

This completes the configuration for the namenode. Next, we will configure the two datanodes.

On each datanode, create a directory to store the runtime data:

mkdir -p /root/user_xyz/hdfs/root_data_path_for_datanode

Edit the etc/hadoop/hdfs-site.xml file on the datanodes with the following content:

<configuration>
        <property>
                <name>dfs.data.dir</name>
                <value>file:///<your_directory>/hdfs/root_data_path_for_datanode</value>
        </property>
</configuration>

After completing the configuration, return to the namenode, navigate to the hadoop directory, format the file system, and start the namenode using the following commands:

bin/hdfs namenode -format
bin/hdfs --daemon start namenode

Then, on each of the two datanodes, navigate to the hadoop directory and start the datanode using the following command:

bin/hdfs --daemon start datanode

Once the entire HDFS cluster is configured, we need to create a directory to store data. On the namenode, in the hadoop directory, execute the following commands:

bin/hdfs dfs -mkdir -p /user/clickhouse/
bin/hdfs dfs -chown clickhouse /user/clickhouse
bin/hdfs dfs -chmod -R 775 /user/clickhouse

Finally, check the status of the entire HDFS cluster and verify that the datanodes are active by running the following command on the namenode:

bin/hdfs dfsadmin -report

Installing the FoundationDB Client

The deployment of the ByConity software package depends on the FoundationDB client software package. The FoundationDB client package is tightly coupled with the FoundationDB server version. Therefore, we need to choose a client package that matches the FoundationDB server version. The FoundationDB client package can be downloaded from the official website. In this example, we are downloading version 7.1.27 for Debian OS on amd64 machines.

curl -L -o foundationdb-clients_7.1.27-1_amd64.deb https://github.com/apple/foundationdb/releases/download/7.1.27/foundationdb-clients_7.1.27-1_amd64.deb

Execute the installation command:

sudo dpkg -i foundationdb-clients_7.1.27-1_amd64.deb

Deploying the ByConity Software Package

Next, we will deploy the ByConity software package, which can be found on our official downloads page. Alternatively, you can build the software package yourself by following this software package build guide.

Installing Software Packages

Install the components required by ByConity.

VERSION=0.4.3
ARCH=amd64

# First, install `byconity-common-static`, which is a dependency for all other packages.
dpkg -i byconity-common-static_${VERSION}_${ARCH}.deb

# Then, in the same way, install the ByConity Resource Manager, ByConity Server,
# ByConity Worker, ByConity Worker Write, and ByConity Daemon Manager.
# `byconity-resource-manager`, `byconity-daemon-manager`, and `byconity-tso` are lightweight services,
# so they can be installed on a shared machine along with other packages.
# However, for `byconity-server`, `byconity-worker`, and `byconity-worker-write`,
# we should install them on separate machines.

dpkg -i byconity-tso_${VERSION}_${ARCH}.deb
dpkg -i byconity-resource-manager_${VERSION}_${ARCH}.deb
dpkg -i byconity-server_${VERSION}_${ARCH}.deb
dpkg -i byconity-worker_${VERSION}_${ARCH}.deb
dpkg -i byconity-worker-write_${VERSION}_${ARCH}.deb
dpkg -i byconity-daemon-manager_${VERSION}_${ARCH}.deb

Preparing Configuration Files

Recommended Configuration File Structure Example

Typically, a configuration is either specific to a component or shared across the cluster (like HDFS/FDB connection parameters). Thus, the following organization structure for configuration files demonstrates how to support both dedicated and shared configurations simultaneously.

- /etc/byconity-server/
  - fdb.cluster
  - byconity-tso.xml
  - byconity-server.xml
  - byconity-worker.xml
  - byconity-worker-write.xml
  - conf.d/
    - xxx.xml

fdb.cluster is the configuration file for connecting the client to the FoundationDB cluster.
byconity-{tso,server,worker,worker-write}.xml are the configuration files used by the respective components.
conf.d/ is a fixed folder name used to store shared configurations. For ByConity, all internal files (without restrictions on naming or number) will be automatically merged into the component’s configuration file. Therefore, common configurations such as <service_discovery/> and <hdfs_nnproxy/> are recommended to be stored here.

The cnch_config method is no longer recommended; its contents can be directly migrated to conf.d/.

In addition to XML format, ByConity also supports YAML format. During conversion, note that XML format has an additional outermost tag — <yandex />.

Editing Configuration Files

Some noteworthy configuration items include:

Create any XML file in conf.d/ to configure the service_discovery and hdfs_nnproxy tags.

In ByConity, there are three methods of mutual discovery between components. The mode tag is used to specify the method, with three modes: local, dns, and consul. In local mode, users need to specify the IP addresses or hostnames of all components in this configuration file by replacing the placeholder {your_xxx_address} (for example, {your_server_address}), which should be the component's actual IP address, such as 10.0.2.72.

In local mode, the specific address of a service is mainly obtained between services via host (service_discovery > xxx > node[] > host). Therefore, it is important to make sure that the value of this item (e.g. an IP or domain name that can be used for external access) can be used for service discovery.

For the hdfs_nnproxy tag, it includes the address of the HDFS namenode.

Starting the Program

First, start TSO.

systemctl start byconity-tso

You can also check the status of TSO with the following command:

systemctl status byconity-tso

Note: When you install the package again (such as during an upgrade), you do not need to execute the start command.

Then start each component in the same way:

systemctl start byconity-server // along with byconity-worker, byconity-resource-manager, byconity-daemon-manager

Check the startup status of each component:

systemctl status | grep byconity-

Install more worker nodes in the same way. Each worker node has a configuration item named WORKER_ID, which should be configured in the configuration file /etc/byconity-server/(byconity-worker|byconity-worker-write).xml. The worker id must be unique among worker nodes, and the default value of WORKER_ID in the configuration file is empty. Please try to name WORKER_ID using the regex pattern \w*-\d+.

Check the status of the computation group

clickhouse client --port 9010

:) select * from system.workers

Deploy on Physical Machines

Installing FoundationDB​

Installing HDFS​

Installing the FoundationDB Client​

Deploying the ByConity Software Package​

Installing Software Packages​

Preparing Configuration Files​

Recommended Configuration File Structure Example​

Editing Configuration Files​

Starting the Program​