Export Data
Document Type: Tutorial
Document format: tutorial purpose, pre-preparation, step-by-step explanation of principles & examples, and related document recommendations;
Summary:
- Which file systems or databases are supported to export the data of the table, and how to export
- Whether to support the export of query results, how to export
ByConity supports redirecting query results to files on the client's SELECT side, where files can be local files or files on HDFS. For example:
SELECT <expr_list> INTO OUTFILE file_name FORMAT format_name
Among them, file_name and format_name are both string types.
Export to local
For example, consider this table:
CREATE TABLE test.select_outfile_testtable
(
`a` Int8,
`b` String,
`c` Int8,
`date` Date
)
ENGINE = CnchMergeTree()
PARTITION by toYYYYMM(date)
ORDER BY tuple()
insert some data:
INSERT INTO test.select_outfile_testtable VALUES (1, 'a', 1,'2022-11-10'), (2, 'b', 2,'2022-11-11'), (3, 'c', 3,'2022-11-12'), (4, 'd', 4,'2022-11-13');
Export the SELECT result to a local file (take CSV as an example):
SELECT * FROM test.select_outfile_testtable INTO OUTFILE 'select.csv' FORMAT CSV
result:
➜ ~ cat select.csv
1,"a",1,"2022-11-10"
2,"b",2,"2022-11-11"
3,"c",3,"2022-11-12"
4,"d",4,"2022-11-13"
export to HDFS
use INTO OUTFILE clause, example
SELECT <expr_list> INTO OUTFILE 'hdfs://ip:port/user/tiger/test/test.gz ' FORMAT format_name
Where hdfs://ip:port/user/tiger/test/test.gz is the uploaded HDFS address, the format is:
- hdfs:// is a fixed prefix, indicating that it is uploaded to HDFS; ip is the hdfs namenode service address.
- /user/tiger/test/test.gz is the path of the output file. It should be noted that the hdfs directory path must be available.
At the same time, it supports exporting to HDFS in common formats, such as CSV, Parquet, JSON, etc. In order to reduce the amount of network transmission, gzip compression is enabled by default.