Data Type
Document Type: Descriptive
Summary:
- What data types does ByConity support?
- Compared with ClickHouse, which data types are lacking in
Difference Summary
ByConity lacks Date32 type, Boolean type, Geo type (requires allow_experimental_geo_types =1 to use)
ByConity's Map has three modes of KV/Byte/CompactByte, among which the KV mode is compatible with the open source ClickHouse.
Integer Types
Fixed-length integers, including signed or unsigned integers.
When creating a table, you can set type parameters for integers (eg. TINYINT(8), SMALLINT(16), INT(32), BIGINT(64)), But ByConity ignores them.
Integer Range
- Int8 — [-128 : 127]
- Int16 — [-32768 : 32767]
- Int32 — [-2147483648 : 2147483647]
- Int64 — [-9223372036854775808 : 9223372036854775807]
- Int128 — [-170141183460469231731687303715884105728 : 170141183460469231731687303715884105727]
- Int256 — [-57896044618658097711785492504343953926634992332820282019728792003956564819968 : 57896044618658097711785492504343953926634992332820282019728792003956564819967]
Alias:
- Int8 — TINYINT, BOOL, BOOLEAN, INT1.
- Int16 — SMALLINT, INT2.
- Int32 — INT, INT4, INTEGER.
- Int64 — BIGINT.
Unsigned Integer Range
- UInt8 — [0 : 255]
- UInt16 — [0 : 65535]
- UInt32 — [0 : 4294967295]
- UInt64 — [0 : 18446744073709551615]
- UInt128 — [0 : 340282366920938463463374607431768211455]
- UInt256 — [0 : 115792089237316195423570985008687907853269984665640564039457584007913129639935]
UUID
A Universally Unique Identifier (UUID) is a 16-byte number that identifies a record. See Wikipedia.
Examples of UUID type values are as follows:
ec282515-a492-46d8-808c-8f3bd899b1b1
If no value is specified for the UUID column when inserting a new record, the UUID value will be filled with zeros:
00000000-0000-0000-0000-000000000000
How to generate
To generate UUID values, ByConity provides the generateuidv4 function.
Usage Example
Example 1
This example demonstrates creating a table with a column of type UUID and inserting a value into the table.
CREATE TABLE t_uuid (x UUID, y String) ENGINE=Memory
INSERT INTO t_uuid SELECT generateUUIDv4(), 'Example 1'
SELECT * FROM t_uuid
┌────────────────────────────────────x─┬─y─────────┐
│ 136091bf-474c-496c-a743-bcd531e12563 │ Example 1 │
└──────────────────────────────────────┴───────────┘
Example 2
In this example, no value for the UUID column is specified when inserting a new record.
INSERT INTO t_uuid (y) VALUES ('Example 2')
SELECT * FROM t_uuid
┌────────────────────────────────────x─┬─y─────────┐
│ 136091bf-474c-496c-a743-bcd531e12563 │ Example 1 │
│ 00000000-0000-0000-0000-000000000000 │ Example 2 │
└──────────────────────────────────────┴───────────┘
Limits
UUID data type only supports string data type also supports functions(for example, min, max, and count).
Arithmetic operations do not support UUID data types(for example, abs)or aggregate functions such as sum and avg.
floating point type
The type is the same as the following C type:
- Float32 - float
- Float64 - double
We recommend that you store data as integers whenever possible. For example, converting a fixed-precision number to an integer value, such as a currency amount or page load time expressed in milliseconds
Use floating point numbers
- Calculations with floating point numbers may cause rounding errors.
SELECT 1 - 0.9
┌───────minus(1, 0.9)─┐
│ 0.09999999999999998 │
└─────────────────────┘
- The result of the calculation depends on the calculation method (processor type and architecture of the computer system)
- Results of floating point calculations may be values such as infinity (INF) and «Not a Number» (NaN). This should be taken into account when calculating with floating point numbers.
- When reading floating-point numbers line by line, the floating-point result may not be the most recent value displayed by the machine.
NaN and Inf
Compared to standard SQL, ByConity supports the following classes of floating point numbers:
- Inf – positive infinity
SELECT 0.5 / 0
┌─divide(0.5, 0)─┐
│ inf │
└────────────────┘
- -Inf – negative infinity
SELECT -0.5 / 0
┌─divide(-0.5, 0)─┐
│ -inf │
└─────────────────┘
- NaN – not a number
SELECT 0 / 0
┌─divide(0, 0)─┐
│ nan │
└──────────────┘
You can see more rules about NaN sorting in ORDER BY clause
Decimal
Including Decimal(P,S),Decimal32(S),Decimal64(S),Decimal128(S)and other types
Signed fixed-point numbers that maintain precision during addition, subtraction, and multiplication operations. For division, the least significant digit is discarded (no rounding).
parameters
- P - Precision. Valid range: [1:38], determines how many decimal digits (including fractions) can be.
- S - Scale. Valid range: [0:P], determines the number of decimal places included in the fractional part of the number.
The following examples are synonymous for different Decimal representations of P parameter values: -P from [1:9] - for Decimal32(S) -P from [10:18] - for Decimal64(S) -P from [ 19:38] - for Decimal128(S)
range of decimal values
- Decimal32(S) - ( -1 10^(9 - S),110^(9-S) )
- Decimal64(S) - ( -1 10^(18 - S),110^(18-S) )
- Decimal128(S) - ( -1 10^(38 - S),110^(38-S) )
For example, Decimal32(4) can represent values from -99999.9999 to 99999.9999 in steps of 0.0001.
Internal representation
The data is stored in a signed integer with the same bit width as itself. The actual range of this number in memory will be higher than the above range, and the corresponding check will be done when converting from String to decimal number.
Since modern CPUs don't support 128-bit numbers, operations on Decimal128 are emulated by software. So the operation speed of Decimal128 is significantly slower than Decimal32/Decimal64.
Operation and result type
Binary operations on Decimal result in a wider result type (regardless of the order of the arguments).
- Decimal64(S1) <op> Decimal32(S2) -> Decimal64(S)
- Decimal128(S1) <op> Decimal32(S2) -> Decimal128(S)
- Decimal128(S1) <op> Decimal64(S2) -> Decimal128(S)
Rules for precision changes:
- Addition, subtraction: S = max(S1, S2).
- Multiplication: S = S1 + S2.
- Division: S = S1.
For similar operations between Decimal and integers, the result is a decimal of the same size as the argument.
Functions between Decimal and Float32/Float64 are undefined. To perform such operations you can use: toDecimal32, toDecimal64, toDecimal128 or toFloat32, toFloat64 requires explicit conversion of one of the arguments. Note that the result will lose precision and type conversions are expensive operations.
Some functions on Decimal return results as Float64 (for example, var or stddev). For some of them, intermediate calculations happen in Decimal . For such functions, the results for the same data in Float64 and Decimal may differ, although the result types are the same.
overflow check
Numerical overflow may occur when performing operations on Decimal types. Excess digits in fractions are discarded (not rounded). Too many digits in an integer will cause an exception.
SELECT toDecimal32(2, 4) AS x, x / 3
┌──────x─┬─divide(toDecimal32(2, 4), 3)─┐
│ 2.0000 │ 0.6666 │
└────────┴──────────────────────────────┘
SELECT toDecimal32(4.2, 8) AS x, x * x
DB::Exception: Scale is out of bounds.
SELECT toDecimal32(4.2, 8) AS x, 6 * x
DB::Exception: Decimal math overflow.
Checking for overflow can result in slower calculations. Overflow checking can be disabled by setting decimal_check_overflow if overflow is known to be impossible, in which case overflow will lead to incorrect results:
SET decimal_check_overflow = 0;
SELECT toDecimal32(4.2, 8) AS x, 6 * x
┌──────────x─┬─multiply(6, toDecimal32(4.2, 8))─┐
│ 4.20000000 │ -17.74967296 │
└────────────┴──────────────────────────────────┘
Overflow checking happens not only on arithmetic operations, but also on comparison operations:
SELECT toDecimal32(1, 8) < 100
DB::Exception: Can't compare.
Date time related types
Date
Stored in two bytes, representing date values from 1970-01-01 (unsigned) to the present. Allows storage of upper threshold constants defined from the Unix epoch up to the compile phase (currently the upper limit is 2106, but 2105 will eventually be fully supported). The minimum output is 1970-01-01.
Range of values:[1970-01-01, 2149-06-06].
There is no time zone information stored in the date.
Example
CREATE TABLE date_test
(
`timestamp` Date,
`event_id` UInt8
)
ENGINE = Memory;
INSERT INTO date_test VALUES ('2019-12-20', 1);
SELECT * FROM date_test;
┌──timestamp─┬─event_id─┐
│ 2019-12-20 │ 1 │
└────────────┴──────────┘
DateTime
Timestamp type. Store Unix timestamps in four bytes (unsigned). Allows storing values in the same range as the date type. The minimum value is 1970-01-01 00:00:00. Timestamp type values are accurate to the second (no leap seconds).
Range of values: [1970-01-01 00:00:00, 2106-02-07 06:28:15].
Time zone
Timestamps are converted from text (broken into components) to binary and back, using the system timezone when the client or server was started. In text format, information about daylight saving time is lost.
By default, clients use the server's time zone when connecting to the service. You can set to use client time by enabling the client command line option --use_client_time_zone.
So when dealing with textual dates (e.g. when saving text dumps), keep in mind that there may be ambiguity during daylight savings changes, and there may be problems matching data if timezones change.
DateTime64
This type allows storing a time value at an instant in the form of a date plus a time, with defined sub-second precision
Timescale size (precision): 10(-precision) seconds
Grammar:
DateTime64(precision, [timezone])
Internally, this type stores data as ticks since the Linux epoch (1970-01-01 00:00:00UTC) as an Int64 type. The resolution of the timescale is determined by the precision parameter. In addition, the DateTime64 type can store time zone information like any other data column, and the time zone affects how values of the DateTime64 type are displayed in text format, and how time data specified as a string ('2020 -01-01 05:00:01.000'). The time zone is not stored in the rows of the table (nor in the resultset), but in the metadata of the columns. For details, please refer to DateTime data type.
Range of values: [1900-01-01 00:00:00, 2299-12-31 23:59:59.99999999] (Note: the maximum value is 8).
Example
1. Create a table with a DateTime64 type column and insert data into it:
CREATE TABLE dt
(
`timestamp` DateTime64(3, 'Asia/Istanbul'),
`event_id` UInt8
)
ENGINE = Memory
INSERT INTO dt Values (1546300800000, 1), ('2019-01-01 00:00:00', 2)
SELECT * FROM dt
┌───────────────timestamp─┬─event_id─┐
│ 2019-01-01 03:00:00.000 │ 1 │
│ 2019-01-01 00:00:00.000 │ 2 │
└─────────────────────────┴──────────┘
- When inserting a datetime as an integer, it is treated as an appropriately scaled Unix timestamp (UTC). 1546300800000 (with a precision of 3) means '2019-01-01 00:00:00' UTC. However, because the timestamp column specifies Asia/Istanbul (UTC+ 3) the time zone, when output as a string, it will appear as '2019-01-01 03:00:00'
- When inserting a string as a datetime, it will be given time zone information. '2019-01-01 00:00:00' will be considered to be in Asia/Istanbul timezone and stored as 1546290000000.
2. Filter the value of DateTime64 type
SELECT * FROM dt WHERE timestamp = toDateTime64('2019-01-01 00:00:00', 3, 'Asia/Istanbul')
┌───────────────timestamp─┬─event_id─┐
│ 2019-01-01 00:00:00.000 │ 2 │
└─────────────────────────┴──────────┘
Unlike DateTime, values of type DateTime64 are not automatically converted from values of type String
3. Get the time zone information of DateTime64 type value:
SELECT toDateTime64(now(), 3, 'Asia/Istanbul') AS column, toTypeName(column) AS x
┌──────────────────column─┬─x──────────────────────────────┐
│ 2019-10-16 04:12:04.000 │ DateTime64(3, 'Asia/Istanbul') │
└─────────────────────────┴────────────────────────────────┘
4. Time zone conversion
SELECT
toDateTime64(timestamp, 3, 'Europe/London') as lon_time,
toDateTime64(timestamp, 3, 'Asia/Istanbul') as mos_time
FROM dt
┌───────────────lon_time──┬────────────────mos_time─┐
│ 2019-01-01 00:00:00.000 │ 2019-01-01 03:00:00.000 │
│ 2018-12-31 21:00:00.000 │ 2019-01-01 00:00:00.000 │
└─────────────────────────┴─────────────────────────┘
Enumeration type(Enum8,Enum16)
Includes Enum8 and Enum16 types. Enum holds the correspondence of 'string'= integer. In ByConity, all operations with Enum data types are performed on values containing integers, even though the user is using string constants. This is more efficient in terms of performance than using the String data type.
- Enum8 is described with 'String'= Int8 pairs.
- Enum16 is described with 'String'= Int16 pairs.
Example usage
Create a column with an enum Enum8('hello' = 1, 'world' = 2) type:
CREATE TABLE t_enum
(
x Enum8('hello' = 1, 'world' = 2)
)
ENGINE = Memory
This x column can only store the values listed in the type definition: 'hello' or 'world'. If you try to save any other value, ByConity throws an exception.
:) INSERT INTO t_enum VALUES ('hello'), ('world'), ('hello')
INSERT INTO t_enum VALUES
Ok.
3 rows in set. Elapsed: 0.002 sec.
:) insert into t_enum values('a')
INSERT INTO t_enum VALUES
Exception on client:
Code: 49. DB::Exception: Unknown element 'a' for type Enum8('hello' = 1, 'world' = 2)
ByConity outputs string values from Enum when you query data from the table.
SELECT * FROM t_enum
┌─x─────┐
│ hello │
│ world │
│ hello │
└───────┘
If you need to see the value of the corresponding row, you must convert the Enum value to an integer type.
SELECT CAST(x, 'Int8') FROM t_enum
┌─CAST(x, 'Int8')─┐
│ 1 │
│ 2 │
│ 1 │
└─────────────────┘
To create enum values in queries, you also need to use CAST.
SELECT toTypeName(CAST('a', 'Enum8(\'a\' = 1, \'b\' = 2)'))
┌─toTypeName(CAST('a', 'Enum8(\'a\' = 1, \'b\' = 2)'))─┐
│ Enum8('a' = 1, 'b' = 2) │
└──────────────────────────────────────────────────────┘
Rules and Usage
Each value ranges from -128 ... 127 for Enum8 types and -32768 ... 32767 for Enum16 types. All strings or numbers must be distinct. Empty strings are allowed. If an Enum type is specified (during table definition), the numbers can be in any order. However, the order is not important.
Neither strings nor numbers in Enum can be NULL.
Enum is included in the nullable type. So if you create a table with this query
CREATE TABLE t_enum_nullable
(
x Nullable( Enum8('hello' = 1, 'world' = 2) )
)
ENGINE = TinyLog
Not only 'hello' and 'world' can be stored, but NULL can also be stored.
INSERT INTO t_enum_nullable Values('hello'),('world'),(NULL)
In memory, Enum columns are stored in the same way as Int8 or Int16 for the corresponding numeric value.
When reading as text, ByConity parses the value into a string and then searches for the corresponding string in the set of enumeration values. If not found, an exception is thrown. When reading the text format, it will find the corresponding value according to the read string. If not found, an exception is thrown.
When written as text, ByConity parses the value into a string and writes it. An exception is thrown if the column data contains garbage data (not numbers from a valid collection). Enum types are read and written in binary in the same way as Int8 and Int16 types.
The implicit default is the numerically smallest value.
In ORDER BY, GROUP BY, IN, DISTINCT, etc., Enum behaves the same as the corresponding number. For example, sort numerically. Enums work the same way for equality and comparison operators as they do for underlying numeric values.
Enum values cannot be compared to numbers. Enums can be compared to constant strings. An exception will be thrown if the string being compared to is not a valid Enum value. You can use the IN operator to determine whether an Enum exists in an Enum collection, where the Enum in the collection needs to be represented by a string.
Most operations with numbers and strings do not work with Enums; for example, Enum types cannot be added to a number. However, Enum has a native toString function which returns its string value.
Enum values can be converted to numeric types using the toT function, where T is a numeric type. If T happens to correspond to the underlying numeric type of Enum, this conversion is zero-cost.
The Enum type can be ALTERed without cost to modify the value of the corresponding collection. Members of Enum can be added or deleted by ALTER operation (deletion is safe as long as the value is not used in the table). As a safety precaution, changing a previously used Enum member will throw an exception.
Through the ALTER operation, you can convert Enum8 to Enum16, and vice versa, just like Int8 to Int16.
String type
String
Strings can be of any length. It can contain an arbitrary set of bytes, including null bytes. Therefore, the String type can replace VARCHAR, BLOB, CLOB, etc. types in other DBMSs.
Encoding
ByConity has no concept of encoding. Strings can be arbitrary sets of bytes, stored and output as they are. For storing text, we recommend using UTF-8 encoding. At least, if your terminal uses UTF-8 (recommended), then reading and writing doesn't require any conversion. Similarly, for different encoding text ByConity will have different functions for processing strings. For example, the length function can calculate the length of the byte array contained in the string, but the lengthUTF8 function assumes that the string is encoded in UTF-8, and calculates the length of the Unicode characters contained in the string.
FixString
A string of fixed length N (N must be a strictly positive natural number).
You can declare a column to be of type FixedString using the following syntax:
<column_name> FixedString(N)
where N represents a natural number.
The FixedString type is efficient when the length of the data is exactly N bytes. In other cases, this may reduce efficiency.
Examples of values that can be efficiently stored in columns of type FixedString:
- IP address in binary representation (IPv6 uses FixedString(16))
- language code (ru_RU, en_US … )
- currency code (USD, RUB … )
- hash value in binary representation (MD5 uses FixedString(16), SHA256 uses FixedString(32))
Please use the UUID data type to store UUID values.
When inserting data into ByConity,
- If the string contains less than `N' bytes, the end of the string will be padded with null bytes.
- If the string contains more bytes than N, a Too large value for FixedString(N) exception will be thrown.
Byconity does not remove null bytes at the end of strings when doing data lookups. If you use a WHERE clause, you will need to manually add null bytes to match the value of FixedString. The following example illustrates how to use the WHERE clause with FixedString.
Consider a table with a FixedString(2) column:
┌─name──┐
│ b │
└───────┘
The query statement SELECT * FROM FixedStringTable WHERE a = 'b' returns no results. Please use null bytes to fill the filter.
SELECT * FROM FixedStringTable
WHERE a = 'b\0'
┌─a─┐
│ b │
└───┘
This method is different from MySQL's CHAR type (in MySQL, strings are padded with spaces and spaces are removed on output).
Note that the length of FixedString(N) is a constant. A string consisting only of null characters, the return value of the function length is N, and The return value of the function empty is 1.
Nested data structures
Nested(Name1 Type1, Name2 Type2, …)
Nested data structures are similar to nested tables. Parameters (column names and types) for nested data structures are similar to CREATE queries. Each table can contain any number of rows of nested data structures.
Example:
CREATE TABLE nested_test (s String, nest Nested(x UInt8, y UInt32)) ENGINE = Memory;
INSERT INTO nested_test VALUES ('Hello', [1,2], [10,20]), ('World', [3,4,5], [30,40,50]), ('Goodbye', [], []);
SELECT * FROM nested_test;
┌─s───────┬─nest.x──┬─nest.y─────┐
│ Hello │ [1,2] │ [10,20] │
│ World │ [3,4,5] │ [30,40,50] │
│ Goodbye │ [] │ [] │
└─────────┴─────────┴────────────┘
So you can simply think of nested data structures as multi-column arrays where all columns are the same length.
SELECT queries can only specify the name of an entire nested data structure when using ARRAY JOIN. For more information, refer to «ARRAY JOIN clause». Example:
SELECT s, nest.x, nest.y FROM nested_test ARRAY JOIN nest;
┌─s─────┬─nest.x─┬─nest.y─┐
│ Hello │ 1 │ 10 │
│ Hello │ 2 │ 20 │
│ World │ 3 │ 30 │
│ World │ 4 │ 40 │
│ World │ 5 │ 50 │
└───────┴────────┴────────┘
A SELECT cannot be performed on an entire nested data structure. Only the columns that are part of it can be explicitly listed.
For INSERT queries, the column arrays in all nested data structures can be passed in individually (provided they are separate column arrays). During insertion, the system checks that they have the same length.
For DESCRIBE queries, columns in nested data structures are listed separately in the same way.
ALTER queries have very limited operations on nested data structures.
Tuple
A tuple, where each element has a separate type.
Tuples cannot be stored in tables (except memtables). They can be used for temporary column grouping. In queries, IN expressions and lambda functions with specific parameters can be used to group temporary columns. For more information, see IN operator and Higher-order functions.
A tuple can be the result of a query. In this case, values in parentheses are comma-separated for text formats other than JSON. In JSON format, tuples are output as arrays (in square brackets).
Create tuple
Functions can be used to create tuples:
tuple(T1, T2, ...)
Example of creating a tuple:
:) SELECT tuple(1,'a') AS x, toTypeName(x)
SELECT
(1, 'a') AS x,
toTypeName(x)
┌─x───────┬─toTypeName(tuple(1, 'a'))─┐
│ (1,'a') │ Tuple(UInt8, String) │
└─────────┴───────────────────────────┘
1 rows in set. Elapsed: 0.021 sec.
Data types in tuples
When dynamically creating tuples, ByConity will automatically assign the smallest expressible type to each parameter of the tuple. If the parameter is NULL, then the corresponding element of this tuple is nullable.
Example of automatic data type detection:
SELECT tuple(1, NULL) AS x, toTypeName(x)
SELECT
(1, NULL) AS x,
toTypeName(x)
┌─x────────┬─toTypeName(tuple(1, NULL))──────┐
│ (1,NULL) │ Tuple(UInt8, Nullable(Nothing)) │
└──────────┴─────────────────────────────────┘
1 rows in set. Elapsed: 0.002 sec.
Low cardinality type
Convert other data types to dictionary-encoded types.
Grammar
LowCardinality(data_type)
Parameter
- data_type — String, FixedString, Date, DateTime, including numeric types, but excluding Decimal. For some data types, LowCardinality is not efficient, please refer to allow_suspicious_low_cardinality_types setting description.
describe
LowCardinality is a concept that changes the way data is stored and processed. ByConity will perform dictionary coding on the column where LowCardinality is located. For many applications, processing dictionary-encoded data can significantly increase SELECT query speed.
The efficiency of using the LowCarditality data type depends on the diversity of the data. If a dictionary contains less than 10000 distinct values, then ByConity allows for more efficient data storage and processing. On the contrary, if the dictionary is more than 10000, the efficiency will be worse.
When using character types, consider using LowCardinality instead of Enum. LowCardinality is generally more flexible and efficient.
Example
Create a column of type LowCardinality:
CREATE TABLE lc_t
(
`id` UInt16,
`strings` LowCardinality(String)
)
ENGINE = Memory
ORDER BY id
area
define
A Domain type is an implementation-specific type that is always binary compatible with an existing base type while adding some extra features that can be used while maintaining the disk data unchanged. Currently ByConity does not support custom domain types.
If you can use an underlying type that is binary-compatible with a Domain type in one place, then you can also use a Domain type in the same place, for example:
- use the Domain type as the type of the column in the table
- Read/write data to columns of Domain type
- If an underlying type that is binary-compatible with the Domain can be indexed, then the Domain type can also be indexed
- Pass the Domain type as a parameter to the function to use
- other
Additional Features for Domains
- When executing SHOW CREATE TABLE or DESCRIBE TABLE, the corresponding columns are always displayed as Domain type names
- Input data in INSERT INTO domain_table(domain_column) VALUES(…) is always entered in a more human-friendly format
- In SELECT domain_column FROM domain_table data is always output in a more human-friendly format
- In INSERT INTO domain_table FORMAT CSV ..., load external source data in a more user-friendly format
Domains Type Restrictions
- Cannot convert an index of underlying type to an index of Domain type via ALTER TABLE.
- A value of type string cannot be implicitly converted to a value of type Domain when inserting data from another column or table.
- Cannot add constraints on values stored as Domain type.
ipv4
IPv4 is a Domain type that is binary compatible with the UInt32 type and is used to store the value of an IPv4 address. It provides more compact binary storage while supporting a more readable input and output format.
Basic usage
CREATE TABLE hits (url String, from IPv4) ENGINE = Memory
DESCRIBE TABLE hits;
┌─name─┬─type───┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┐
│ url │ String │ │ │ │ │
│ from │ IPv4 │ │ │ │ │
└──────┴────────┴──────────────┴────────────────────┴─────────┴──────────────────┘
Also you can use IPv4 type column as primary key:
CREATE TABLE hits (url String, from IPv4) ENGINE = Memory;
When writing and querying, the IPv4 type can recognize a more readable input and output format:
INSERT INTO hits (url, from) VALUES ('https://wikipedia.org', '116.253.40.133')('https://byconity.com', '183.247.232.58')('https://byconity.com/docs/en/', '116.106.34.242');
SELECT * FROM hits;
┌─url────────────────────────────────┬───────────from─┐
│ https://byconity.com/docs/en/ │ 116.106.34.242 │
│ https://wikipedia.org │ 116.253.40.133 │
│ https://byconity.com │ 183.247.232.58 │
└────────────────────────────────────┴────────────────┘
At the same time it provides a more compact binary storage format:
SELECT toTypeName(from), hex(from) FROM hits LIMIT 1;
┌─toTypeName(from)─┬─hex(from)─┐
│ IPv4 │ B7F7E83A │
└──────────────────┴───────────┘
Not implicitly convertible to types other than UInt32. If you want to convert a value of type IPv4 to a string, you can use IPv4NumToString() to do the conversion as shown:
SELECT toTypeName(s), IPv4NumToString(from) as s FROM hits LIMIT 1;
┌─toTypeName(IPv4NumToString(from))─┬─s──────────────┐
│ String │ 183.247.232.58 │
└───────────────────────────────────┴────────────────┘
Or you can use CAST to convert it to UInt32 type:
SELECT toTypeName(i), CAST(from as UInt32) as i FROM hits LIMIT 1;
┌─toTypeName(CAST(from, 'UInt32'))─┬──────────i─┐
│ UInt32 │ 3086477370 │
└──────────────────────────────────┴────────────┘
ipv6
IPv6 is a Domain type that maintains binary compatibility with the FixedString(16) type, which is used to store the value of an IPv6 address. It provides more compact binary storage while supporting a more readable input and output format.
Basic Use
CREATE TABLE hits (url String, from IPv6) ENGINE = Memory;
DESCRIBE TABLE hits;
┌─name─┬─type───┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┐
│ url │ String │ │ │ │ │
│ from │ IPv6 │ │ │ │ │
└──────┴────────┴──────────────┴────────────────────┴─────────┴──────────────────┘
Also you can use IPv6 type column as primary key:
CREATE TABLE hits (url String, from IPv6) ENGINE = Memory;
When writing and querying, the IPv6 type can recognize a more readable input and output format:
INSERT INTO hits (url, from) VALUES ('https://wikipedia.org', '2a02:aa08:e000:3100::2')('https://byconity.com', '2001:44c8:129:2632:33:0:252:2')('https://byconity.com/docs/en/', '2a02:e980:1e::1');
SELECT * FROM hits;
┌─url────────────────────────────────┬─from──────────────────────────┐
│ https://byconity.com │ 2001:44c8:129:2632:33:0:252:2 │
│ https://byconity.com/docs/en/ │ 2a02:e980:1e::1 │
│ https://wikipedia.org │ 2a02:aa08:e000:3100::2 │
└────────────────────────────────────┴───────────────────────────────┘
At the same time it provides a more compact binary storage format:
SELECT toTypeName(from), hex(from) FROM hits LIMIT 1;
┌─toTypeName(from)─┬─hex(from)────────────────────────┐
│ IPv6 │ 200144C8012926320033000002520002 │
└──────────────────┴──────────────────────────────────┘
Not implicitly convertible to types other than FixedString(16). If you want to convert a value of type IPv6 to a string, you can use IPv6NumToString() to do the conversion as shown:
SELECT toTypeName(s), IPv6NumToString(from) as s FROM hits LIMIT 1;
┌─toTypeName(IPv6NumToString(from))─┬─s─────────────────────────────┐
│ String │ 2001:44c8:129:2632:33:0:252:2 │
└───────────────────────────────────┴───────────────────────────────┘
Or use CAST to convert it to FixedString(16):
SELECT toTypeName(i), CAST(from as FixedString(16)) as i FROM hits LIMIT 1;
┌─toTypeName(CAST(from, 'FixedString(16)'))─┬─i───────┐
│ FixedString(16) │ ��� │
└───────────────────────────────────────────┴─────────┘
Map(key, value)
Map(key, value) 可以存储 key:value 键值对类型的数据。
parameter
- key — the key of the key-value pair, the type can be: String, Integer, LowCardinality, Or FixedString.
- value — the value of the key-value pair, the type can be: String, Integer, Array, LowCardinality, or [FixedString](https://clickhouse.com/docs/en/sql-reference/data- types/fixedstring).
There is a lot of content, please refer to Map function combing (WIP) CE version part.
Array(T)
An array of elements of type T.
T can be any type, including array types. However, multidimensional arrays are not recommended, and byconity has limited support for multidimensional arrays. For example, you cannot store multidimensional arrays in a MergeTree table.
create array
You can use the array function to create arrays:
array(T)
You can also use square brackets:
[]
Example of creating an array:
SELECT array(1, 2) AS x, toTypeName(x)
┌─x─────┬─toTypeName(array(1, 2))─┐
│ [1,2] │ Array(UInt8) │
└───────┴─────────────────────────┘
SELECT [1, 2] AS x, toTypeName(x)
┌─x─────┬─toTypeName([1, 2])─┐
│ [1,2] │ Array(UInt8) │
└───────┴────────────────────┘
Use data types
ByConity will automatically detect the array elements, and calculate the smallest data type to store these elements according to the elements. If there is a NULL or a null-literal type element, then the element type of the array will become [nullable](https://clickhouse.com/docs/en/sql -reference/data-types/nullable).
If ByConity cannot determine the data type, it will raise an exception. This happens when trying to create an array containing both strings and numbers (SELECT array(1, 'a')).
Example of automatic data type detection:
SELECT array(1, 2, NULL) AS x, toTypeName(x)
┌─x──────────┬─toTypeName(array(1, 2, NULL))─┐
│ [1,2,NULL] │ Array(Nullable(UInt8)) │
└────────────┴───────────────────────────────┘
ByConity will throw an exception if you try to create an array of incompatible data types:
SELECT array(1, 'a');
SELECT [1, 'a']
Query id: b3b81f4d-a082-4b2f-a63a-784edf94998e
0 rows in set. Elapsed: 1.231 sec.
Received exception from server (version 21.8.7):
Code: 386. DB::Exception: Received from 0.0.0.0:30000. DB::Exception: There is no supertype for types UInt8, String because some of them are String/FixedString and some of them are not: While processing [1, 'a'] SQLSTATE: HY000.
Nullable(typename)
Allow special markup (NULL) to indicate «missing value», which can be stored together with the normal value of TypeName . For example, a column of type Nullable(Int8) can store a value of type Int8, while a row with no value will store NULL.
For TypeName, composite data type arrays and tuples cannot be used. Compound data types can contain Nullable type values, for example Array(Nullable(Int8)).
Fields of type Nullable cannot be included in table indexes.
NULL is the default value for any Nullable type unless specified otherwise in the ByConity server configuration.
Storage Features
To store Nullable type values in table columns, ByConity uses separate files with NULL masks in addition to normal files with values. Entries in the mask file allow ByConity to distinguish between NULL for each table row and the default value for the corresponding data type. A Nullable column consumes additional storage compared to a similar normal file due to the new file being appended.
!!! Points to Note "Points to Note" Using Nullable almost always has a negative impact on performance, keep this in mind when designing your database
Entries in the mask file allow ByConity to distinguish between «NULL» and default values for the corresponding data types of each table row. Due to the extra file, «Nullable» columns consume more storage space than normal columns
AggregateFunction(name, types_of_arguments…)
The intermediate state of the aggregation function can be obtained by adding the -State suffix to the name of the aggregation function. At the same time, when you need to access the final state data of this type, you need to use the same aggregation function name plus -Merge suffix to get the final state data.
AggregateFunction — parameterized data type.
parameter
- aggregate function name
If the function has multiple parameter lists, specify values from the other parameter lists here.
- Types of aggregate function arguments
Example
CREATE TABLE t
(
column1 AggregateFunction(uniq, UInt64),
column2 AggregateFunction(anyIf, String, UInt8),
column3 AggregateFunction(quantiles(0.5, 0.9), UInt64)
) ENGINE = ...
uniq in the above, anyIf (any + if) and quantile are aggregate functions supported in ByConity.
user's guidance
Data writing
When you need to write data, you need to include the data in the INSERT SELECT statement, and for AggregateFunction type data, you need to use the corresponding function with -State as the suffix deal with.
Function Usage Example
uniqState(UserID)
quantilesState(0.5, 0.9)(SendTiming)
Unlike the uniq and quantiles functions that return the final value of the aggregated result, the functions suffixed with -State always return the intermediate state of the data of type AggregateFunction.
For SELECT, the AggregateFunction type is always present in a specific binary form in all output formats. For example, you can use a SELECT statement to dump a function's state data into TabSeparated format while using an INSERT statement to dump the data back.
data query
When querying data from AggregatingMergeTree tables, for fields of type AggregateFunction, you need to use the same aggregation function suffixed with -Merge to aggregate the data. For fields that are not of type AggregateFunction, include them in the GROUP BY clause.
Aggregation functions suffixed with -Merge can combine multiple intermediate states of AggregateFunction type into the final aggregation result.
For example, the following two queries always return the same result:
SELECT uniq(UserID) FROM table
SELECT uniqMerge(state) FROM (SELECT uniqState(UserID) AS state FROM table GROUP BY RegionID)
Example of use
See the description of CnchAggregatingMergeTree
SimpleAggregateFunction
The SimpleAggregateFunction(name, types_of_arguments…) data type stores the current value of the aggregate function, and does not store its full state like AggregateFunction. This optimization can be applied to functions with the following properties: The result of applying the function f to the row set S1 UNION ALL S2 can be obtained by applying f to parts of the row set separately, and then Then apply f to the result to get: f(S1 UNION ALL S2) = f(f(S1) UNION ALL f(S2)). This property guarantees that the partial aggregation result is sufficient to compute the merged result, so we don't have to store and process any additional data.
The following aggregate functions are supported:
- any
- anyLast
- min
- max
- sum
- sumWithOverflow
- groupBitAnd
- groupBitOr
- groupBitXor
- groupArrayArray
- groupUniqArrayArray
- sumMap
- minMap
- maxMap
- argMin
- argMax
!!! note "Note" SimpleAggregateFunction(func, Type) has the same value appearance and storage method as Type, so you don't need to apply with -Merge/*-State * suffix function.
`SimpleAggregateFunction` outperforms `AggregateFunction` with the same aggregate function.
parameter
- The name of the aggregate function.
- The type of the aggregate function parameter.
example
CREATE TABLE simple (id UInt64, val SimpleAggregateFunction(sum, Double)) ENGINE=CnchAggregatingMergeTree ORDER BY id;
Special data types
Values of special data types can neither be stored in tables nor output in results, but can be used in intermediate results of queries.
Interval type
A family of data types representing time and date intervals. INTERVAL The result type of the operation.
!!! warning "Warning" Interval data type values cannot be stored in the table.
structure:
- The time interval as an unsigned integer value.
- The type of time interval.
Supported interval types:
- SECOND
- MINUTE
- HOUR
- DAY
- WEEK
- MONTH
- QUARTER
- YEAR
For each interval type, there is a separate data type. For example, the DAY interval corresponds to the IntervalDay data type:
SELECT toTypeName(INTERVAL 4 DAY)
┌─toTypeName(toIntervalDay(4))─┐
│ IntervalDay │
└──────────────────────────────┘
Instructions for use
You can use date and datetime values of type Interval are used in arithmetic operations on values of type Interval. For example, you can add 4 days to the current time:
SELECT now() as current_date_time, current_date_time + INTERVAL 4 DAY
┌───current_date_time─┬─plus(now(), toIntervalDay(4))─┐
│ 2019-10-23 10:58:45 │ 2019-10-27 10:58:45 │
└─────────────────────┴───────────────────────────────┘
Intervals of different types cannot be combined. You cannot use a time interval such as 4 DAY 1 HOUR. Specify the interval in units less than or equal to the smallest unit of the time interval, for example, the time interval 1 day and an hour can be expressed as 25 HOUR or 90000 SECOND.
You cannot perform arithmetic operations on values of type Interval, but you can add different types of time intervals to values of Date or DateTime data types, for example:
SELECT now() AS current_date_time, current_date_time + INTERVAL 4 DAY + INTERVAL 3 HOUR
┌───current_date_time─┬─plus(plus(now(), toIntervalDay(4)), toIntervalHour(3))─┐
│ 2019-10-23 11:16:28 │ 2019-10-27 14:16:28 │
└─────────────────────┴────────────────────────────────────────────────────────┘
The following query will result in an exception:
select now() AS current_date_time, current_date_time + (INTERVAL 4 DAY + INTERVAL 3 HOUR)
Received exception from server (version 19.14.1):
Code: 43. DB::Exception: Received from localhost:9000. DB::Exception: Wrong argument types for function plus: if one argument is Interval, then another must be Date or DateTime..
expression
Used to represent lambdas in higher-order functions.
gather
Can be used in the right half of an IN expression.
Nothing
The sole purpose of this data type is to represent situations where a value is not expected. So you cannot create a value of type Nothing.
For example, the literal NULL has type Nullable(Nothing). See Nullable for details.
The Nothing type can also be used to represent empty arrays:
:) SELECT toTypeName(array())
SELECT toTypeName([])
┌─toTypeName(array())─┐
│ Array(Nothing) │
└─────────────────────┘
1 rows in set. Elapsed: 0.062 sec.