Skip to main content
Version: 0.2.0

Export Data

Document Type: Tutorial

Document format: tutorial purpose, pre-preparation, step-by-step explanation of principles & examples, and related document recommendations;

Summary:

  1. Which file systems or databases are supported to export the data of the table, and how to export
  2. Whether to support the export of query results, how to export

ByConity supports redirecting query results to files on the client's SELECT side, where files can be local files or files on HDFS. For example:

SELECT <expr_list> INTO OUTFILE file_name FORMAT format_name

Among them, file_name and format_name are both string types.

Export to local

For example, consider this table:

CREATE TABLE test.select_outfile_testtable
(
`a` Int8,
`b` String,
`c` Int8,
`date` Date
)
ENGINE = CnchMergeTree()
PARTITION by toYYYYMM(date)
ORDER BY tuple()

insert some data:

 INSERT INTO test.select_outfile_testtable VALUES (1, 'a', 1,'2022-11-10'), (2, 'b', 2,'2022-11-11'), (3, 'c', 3,'2022-11-12'), (4, 'd', 4,'2022-11-13');

Export the SELECT result to a local file (take CSV as an example):

SELECT * FROM test.select_outfile_testtable INTO OUTFILE 'select.csv' FORMAT CSV

result:

➜  ~ cat select.csv
1,"a",1,"2022-11-10"
2,"b",2,"2022-11-11"
3,"c",3,"2022-11-12"
4,"d",4,"2022-11-13"

export to HDFS

use INTO OUTFILE clause, example

SELECT <expr_list> INTO OUTFILE 'hdfs://ip:port/user/tiger/test/test.gz ' FORMAT format_name

Where hdfs://ip:port/user/tiger/test/test.gz is the uploaded HDFS address, the format is:

  1. hdfs:// is a fixed prefix, indicating that it is uploaded to HDFS; ip is the hdfs namenode service address.
  2. /user/tiger/test/test.gz is the path of the output file. It should be noted that the hdfs directory path must be available.

At the same time, it supports exporting to HDFS in common formats, such as CSV, Parquet, JSON, etc. In order to reduce the amount of network transmission, gzip compression is enabled by default.