Skip to main content
Version: 0.3.x

Data Direct Access

Importing from Hive External Tables

CnchHive is a table engine provided by ByConity that supports federated queries using external tables. Users can directly accelerate data queries without the need for data import.

Usage Examples:

  • Example 1: Creating a Full Set of Hive Tables
-- Creating a Hive external table
CREATE TABLE t
(
client_ip String,
request String,
status_code INT,
object_size INT,
date String
)
ENGINE = CnchHive('psm', 'hive_database_name', 'hive_table_name')
PARTITION BY date;

-- Parameter Explanation:
-- psm: hivemetastore psm
-- hive_database_name: Hive database name
-- hive_table_name: Hive table name

-- Querying the Hive external table
select * from t where xxx;
  • Example 2: Creating a Subset of Hive Tables
CREATE TABLE t
(
client_ip String,
request String,
date String
)
ENGINE = CnchHive('psm', 'hive_database_name', 'hive_table_name')
PARTITION BY date;

-- Parameter Explanation:
-- psm: hivemetastore psm
-- hive_database_name: Hive database name
-- hive_table_name: Hive table name

-- Querying the Hive external table
select * from t where xxx;
  • Example 3: Creating a Hive Bucket Table
CREATE TABLE t
(
client_ip String,
request String,
device_id String,
server_time String,
date String
)
ENGINE = CnchHive('psm', 'hive_database_name', 'hive_table_name')
PARTITION BY date
CLUSTER BY device_id INTO 65536 BUCKETS
ORDER BY server_time
SETTINGS cnch_vw_default ='vw_default';

-- Parameter Explanation:
-- psm: hivemetastore psm
-- hive_database_name: Hive database name
-- hive_table_name: Hive table name

-- Querying the Hive external table
select * from t where xxx;

Notes:

  • The external table columns must correspond one-to-one with the Hive table columns, but the order of the columns does not matter. You can select some of the columns from the Hive table, but all partition columns must be included.
  • Partitions need to be specified using the PARTITION BY statement and defined in the description list. For bucket tables, the bucket column and the number of buckets need to be specified. If there is an ORDER BY field, it also needs to be specified.
  • The engine is specified as CnchHive. The parameters include: psm, hive_database_name, and hive_table_name. Views are not supported.
  • There are two configurations: cnch_vw_default is used to specify the vw, and max_read_row_group_threads is used to specify the number of concurrent reads for Parquet row groups.
  • The supported column type mappings are shown in the following table:
Hive Column TypeCnchHive Column TypeDescription
INT/INTERGERINT/INTERGER
BIGINTBIGINT
TIMESTAMPDateTime
STRINGString
VARCHARFixedStringInternally converted to FixedString
CHARFixedStringInternally converted to FixedString
DOUBLEDOUBLE
FLOATFLOAT
DECIMALDECIMAL
MAPMap
ARRAYArray

Note: Schema changes in Hive tables are not automatically synchronized and require recreating the Hive external table in ClickHouse. Currently, only the Parquet storage format is supported, and insert and alter operations are not supported.

Parameter Settings

  • cnch_vw_default: Used to specify the vw.
  • max_read_row_group_threads: Used to specify the number of concurrent reads for Parquet row groups.