Deploy on Physical Machines
Currently, the way to install ByConity on physical machines is through package managers, such as installing Debian packages for Debian OS or rpm packages for Centos OS. Since ByConity uses FoundationDB as its metadata storage and HDFS as its data storage, before deploying ByConity, we need to deploy FoundationDB and HDFS first. The steps are to install FoundationDB (FDB) first, then install HDFS, and finally deploy the ByConity software package. Details are as follows.
Installing FoundationDB
In this section, we will set up a FoundationDB cluster on 3 physical machines, all using the Debian operating system. We refer to the official guides: Getting Started on Linux and Building a Cluster.
First, we need to download the binary files from the official download page for installation. If access is slow from within China, we provide a domestic download address. Download the server, monitor, and cli binary files, as well as the corresponding sha256 checksum files (we use version 7.1.25 as an example).
Create a foundationdb/bin
folder locally and download the installation files:
curl -L -o fdbserver.x86_64 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbserver.x86_64
curl -L -o fdbserver.x86_64.sha256 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbserver.x86_64.sha256
curl -L -o fdbmonitor.x86_64 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbmonitor.x86_64
curl -L -o fdbmonitor.x86_64.sha256 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbmonitor.x86_64.sha256
curl -L -o fdbcli.x86_64 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbcli.x86_64
curl -L -o fdbcli.x86_64.sha256 https://github.com/apple/foundationdb/releases/download/7.1.25/fdbcli.x86_64.sha256
- Domestic download address
https://release-bin.tos-cn-beijing.volces.com/fdb/7.1.25/fdbcli.x86_64
https://release-bin.tos-cn-beijing.volces.com/fdb/7.1.25/fdbmonitor.x86_64
https://release-bin.tos-cn-beijing.volces.com/fdb/7.1.25/fdbserver.x86_64
https://release-bin.tos-cn-beijing.volces.com/fdb/7.1.25/foundationdb-clients-7.1.25-1.el7.x86_64.rpm
https://release-bin.tos-cn-beijing.volces.com/fdb/7.1.25/foundationdb-clients_7.1.25-1_amd64.deb
After the download is complete, check the checksums:
$ sha256sum --binary fdbserver.x86_64
73b70a75464e64fd0a01a7536e110e31c3e6ce793d425aecfc40f0be9f0652b7 *fdbserver.x86_64
$ cat fdbserver.x86_64.sha256
73b70a75464e64fd0a01a7536e110e31c3e6ce793d425aecfc40f0be9f0652b7 fdbserver.x86_64
Rename the executable files, add executable permissions, and delete unnecessary files:
rm *.sha256
mv fdbcli.x86_64 fdbcli
mv fdbmonitor.x86_64 fdbmonitor
mv fdbserver.x86_64 fdbserver
chmod ug+x fdbcli fdbmonitor fdbserver
Create folders to store configurations, data, and logs:
mkdir -p /<your_directory>/fdb_runtime/config
mkdir -p /<your_directory>/fdb_runtime/data
mkdir -p /<your_directory>/fdb_runtime/logs
Create a foundationdb.conf
configuration file in the /<your_directory>/fdb_runtime/config/
folder with the following content:
$ cat /<your_directory>/fdb_runtime/config/foundationdb.conf
[fdbmonitor]
user = root
[general]
cluster-file = /<your_directory>/fdb_runtime/config/fdb.cluster
restart-delay = 60
[fdbserver]
command = /<your_directory>/foundationdb/bin/fdbserver
datadir = /<your_directory>/fdb_runtime/data/$ID
logdir = /<your_directory>/fdb_runtime/logs/
public-address = auto:$ID
listen-address = public
[fdbserver.4500]
class=stateless
[fdbserver.4501]
class=transaction
[fdbserver.4502]
class=storage
[fdbserver.4503]
class=stateless
Create a file named fdb.cluster
in the same folder with the following content:
$ cat /<your_directory>/fdb_runtime/config/fdb.cluster
# Replace <your_ip_address> with the local IP address
clusterdsc:test@<your_ip_address>:4500
Install FDB as a systemd
service. In the same folder, create a file named fdb.service
with the following content:
$ cat /<your_directory>/fdb_runtime/config/fdb.service
[Unit]
Description=FoundationDB (KV storage for cnch metastore)
[Service]
Restart=always
RestartSec=30
TimeoutStopSec=600
ExecStart=/<your_directory>/foundationdb/bin/fdbmonitor --conffile /<your_directory>/fdb_runtime/config/foundationdb.conf --lockfile /<your_directory>/fdb_runtime/fdbmonitor.pid
[Install]
WantedBy=multi-user.target
Now that the configuration files are prepared, proceed to install FDB into systemd
.
Copy the service file to the /etc/systemd/system/
directory:
cp fdb.service /etc/systemd/system/
Reload the service files to include the new service:
systemctl daemon-reload
Enable and start the service:
systemctl enable fdb.service
systemctl start fdb.service
Check the service and see if it's active:
$ systemctl status fdb.service
fdb.service - FoundationDB (KV storage for cnch metastore)
Loaded: loaded (/etc/systemd/system/fdb.service; disabled; vendor preset: enabled)
Active: active (running) since Tue 2023-01-17 18:35:42 CST; 20s ago
...
Now that the FDB service has been installed on one machine, repeat the same steps to install the FDB service on the other two machines.
After installation, you need to connect the three FDB services to form a cluster. Go back to the first node and use fdbcli to connect to FDB.
$ ./foundationdb/bin/fdbcli -C fdb_runtime/config/fdb.cluster
Using cluster file `fdb_runtime/config/fdb.cluster'.
The database is unavailable; type `status' for more information.
Welcome to the fdbcli. For help, type `help'.
fdb>
Execute the following command to initialize the database:
configure new single ssd
Set all three machines as coordinators, replacing the addresses with your machine addresses:
coordinators <node_1_ip_address>:4500 <node_2_ip_address>:4500 <node_3_ip_address>:4500
Then exit fdbcli, and you'll find that the fdb.cluster
file now has new content:
$ cat fdb_runtime/config/fdb.cluster
# DO NOT EDIT!
# This file is auto-generated, it is not to be edited by hand
clusterdsc:wwxVEcyLvSiO3BGKxjIw7Sg5d1UTX5ad@example1.host.com:4500,example2.host.com:4500,example3.host.com:4500
Copy this file to the other two machines and replace the old files, then restart the fdb service:
systemctl restart fdb.service
After that, return to the first machine, connect to FDB using fdbcli again, and execute the following command to change the redundancy mode to double
:
configure double
Then execute the status
command in fdbcli to view the results. You should see something similar to the following:
fdb> status
Using cluster file `fdb_runtime/config/fdb.cluster'.
Configuration:
Redundancy mode - double
Storage engine - ssd-2
Coordinators - 3
Usable Regions - 1
This confirms that you have completed the installation of the FoundationDB server. Now you have the fdb.cluster
file, which we will use in the Byconity configuration.
Installing HDFS
Here, we will install HDFS on three machines, with one machine acting as the namenode and the other two as datanodes. For detailed instructions, refer to the official documentation: SingleCluster and ClusterSetup. We will install HDFS version 3.3.4 based on Java 8.
First, install Java on all three machines. There are many ways to install Java, but here we will use the following two commands:
sudo apt-get update
sudo apt-get install openjdk-8-jdk
Next, download a Hadoop distribution from the official website or use our provided domestic fast download link. Extract the archive and navigate to the directory:
$ curl -L -o hadoop-3.3.4.tar.gz https://dlcdn.apache.org/hadoop/common/stable/hadoop-3.3.4.tar.gz
$ tar xvf hadoop-3.3.4.tar.gz
$ ls
hadoop-3.3.4 hadoop-3.3.4.tar.gz
$ cd hadoop-3.3.4
- Domestic download address
https://release-bin.tos-cn-beijing.volces.com/hdfs/3.3.6/hadoop-3.3.6.tar.gz
Edit the etc/hadoop/hadoop-env.sh
file to set the environment variables:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/<your_directory>/hdfs/hadoop-3.3.4
export HADOOP_LOG_DIR=/<your_directory>/hdfs/logs
Edit the etc/hadoop/core-site.xml
file with the following content, replacing <your_name_node_ip_address>
with the actual IP address of your namenode:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://<your_name_node_ip_address>:12001</value>
</property>
</configuration>
Now that the common setup for all three machines is complete, we will proceed with the specific configurations for the namenode and datanodes.
On the namenode, create a file containing the list of datanodes. For example, datanodes_list.txt
should look like this:
$ cat /root/user_xyz/hdfs/datanodes_list.txt
<datanode_1_address>
<datanode_2_address>
Create a directory to store the namenode's runtime data:
mkdir -p /<your_directory>/hdfs/root_data_path_for_namenode
Edit the etc/hadoop/hdfs-site.xml
file on the namenode with the following content:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///<your_directory>/hdfs/root_data_path_for_namenode</value>
</property>
<property>
<name>dfs.hosts</name>
<value>/<your_directory>/hdfs/datanodes_list.txt</value>
</property>
</configuration>
This completes the configuration for the namenode. Next, we will configure the two datanodes.
On each datanode, create a directory to store the runtime data:
mkdir -p /root/user_xyz/hdfs/root_data_path_for_datanode
Edit the etc/hadoop/hdfs-site.xml
file on the datanodes with the following content:
<configuration>
<property>
<name>dfs.data.dir</name>
<value>file:///<your_directory>/hdfs/root_data_path_for_datanode</value>
</property>
</configuration>
After completing the configuration, return to the namenode, navigate to the hadoop directory, format the file system, and start the namenode using the following commands:
bin/hdfs namenode -format
bin/hdfs --daemon start namenode
Then, on each of the two datanodes, navigate to the hadoop directory and start the datanode using the following command:
bin/hdfs --daemon start datanode
Once the entire HDFS cluster is configured, we need to create a directory to store data. On the namenode, in the hadoop directory, execute the following commands:
bin/hdfs dfs -mkdir -p /user/clickhouse/
bin/hdfs dfs -chown clickhouse /user/clickhouse
bin/hdfs dfs -chmod -R 775 /user/clickhouse
Finally, check the status of the entire HDFS cluster and verify that the datanodes are active by running the following command on the namenode:
bin/hdfs dfsadmin -report
Installing the FoundationDB Client
The deployment of the ByConity software package depends on the FoundationDB client software package. The FoundationDB client package is tightly coupled with the FoundationDB server version. Therefore, we need to choose a client package that matches the FoundationDB server version. The FoundationDB client package can be downloaded from the official website. In this example, we are downloading version 7.1.27 for Debian OS on amd64 machines.
curl -L -o foundationdb-clients_7.1.27-1_amd64.deb https://github.com/apple/foundationdb/releases/download/7.1.27/foundationdb-clients_7.1.27-1_amd64.deb
Execute the installation command:
sudo dpkg -i foundationdb-clients_7.1.27-1_amd64.deb
Deploying the ByConity Software Package
Next, we will deploy the ByConity software package, which can be found on our official downloads page. Alternatively, you can build the software package yourself by following this software package build guide.
Installing Software Packages
Install the components required by ByConity.
VERSION=0.4.3
ARCH=amd64
# First, install `byconity-common-static`, which is a dependency for all other packages.
dpkg -i byconity-common-static_${VERSION}_${ARCH}.deb
# Then, in the same way, install the ByConity Resource Manager, ByConity Server,
# ByConity Worker, ByConity Worker Write, and ByConity Daemon Manager.
# `byconity-resource-manager`, `byconity-daemon-manager`, and `byconity-tso` are lightweight services,
# so they can be installed on a shared machine along with other packages.
# However, for `byconity-server`, `byconity-worker`, and `byconity-worker-write`,
# we should install them on separate machines.
dpkg -i byconity-tso_${VERSION}_${ARCH}.deb
dpkg -i byconity-resource-manager_${VERSION}_${ARCH}.deb
dpkg -i byconity-server_${VERSION}_${ARCH}.deb
dpkg -i byconity-worker_${VERSION}_${ARCH}.deb
dpkg -i byconity-worker-write_${VERSION}_${ARCH}.deb
dpkg -i byconity-daemon-manager_${VERSION}_${ARCH}.deb
Preparing Configuration Files
Recommended Configuration File Structure Example
Typically, a configuration is either specific to a component or shared across the cluster (like HDFS/FDB connection parameters). Thus, the following organization structure for configuration files demonstrates how to support both dedicated and shared configurations simultaneously.
- /etc/byconity-server/
- fdb.cluster
- byconity-tso.xml
- byconity-server.xml
- byconity-worker.xml
- byconity-worker-write.xml
- conf.d/
- xxx.xml
fdb.cluster
is the configuration file for connecting the client to the FoundationDB cluster.byconity-{tso,server,worker,worker-write}.xml
are the configuration files used by the respective components.conf.d/
is a fixed folder name used to store shared configurations. For ByConity, all internal files (without restrictions on naming or number) will be automatically merged into the component’s configuration file. Therefore, common configurations such as<service_discovery/>
and<hdfs_nnproxy/>
are recommended to be stored here.
The cnch_config
method is no longer recommended; its contents can be directly migrated to conf.d/
.
In addition to XML format, ByConity also supports YAML format. During conversion, note that XML format has an additional outermost tag — <yandex />
.
Editing Configuration Files
Some noteworthy configuration items include:
Create any XML
file in conf.d/
to configure the service_discovery
and hdfs_nnproxy
tags.
In ByConity, there are three methods of mutual discovery between components. The mode
tag is used to specify the method, with three modes: local
, dns
, and consul
. In local
mode, users need to specify the IP addresses or hostnames of all components in this configuration file by replacing the placeholder {your_xxx_address}
(for example, {your_server_address}
), which should be the component's actual IP address, such as 10.0.2.72
.
In local
mode, the specific address of a service is mainly obtained between services via host
(service_discovery > xxx > node[] > host
).
Therefore, it is important to make sure that the value of this item (e.g. an IP or domain name that can be used for external access) can be used for service discovery.
For the hdfs_nnproxy
tag, it includes the address of the HDFS namenode.
Starting the Program
First, start TSO.
systemctl start byconity-tso
You can also check the status of TSO with the following command:
systemctl status byconity-tso
Note: When you install the package again (such as during an upgrade), you do not need to execute the start
command.
Then start each component in the same way:
systemctl start byconity-server // along with byconity-worker, byconity-resource-manager, byconity-daemon-manager
Check the startup status of each component:
systemctl status | grep byconity-
Install more worker nodes in the same way. Each worker node has a configuration item named WORKER_ID
, which should be configured in the configuration file /etc/byconity-server/(byconity-worker|byconity-worker-write).xml
. The worker id
must be unique among worker nodes, and the default value of WORKER_ID
in the configuration file is empty. Please try to name WORKER_ID
using the regex pattern \w*-\d+
.
- Check the status of the computation group
clickhouse client --port 9010
:) select * from system.workers