重难点DML语句
Clickhouse的语法并不复杂,但是有些查询语句不仅比较难理解,而且出现的频率还极高。
能够用好它们,做大数据的OLAP基本上没啥难度。用不好,面对一大堆数据可能会觉得无从下手。
这里就来盘点盘点。
WITH子句
WITH子句用于在查询中定义一个临时表,也称为子查询,它将复杂查询分解为更小的、可重复使用的部分,提高查询的可读性和易用性。
它也能够让变量像流
一样不断往下传递。
WITH
funcA(...) AS a,
funcB(...a) AS b,
funcC(...b) AS c,
funcD(...a, c) AS d,
funcE(...d, b) AS e,
......
WITH
的语法。
-- 语法
WITH [RECURSIVE] TABLE_NAME [(COLUMN_LIST)] AS
(
SUB_QUERY
)
-- 或者
STATEMENT WITH [CUBE | ROLLUP | TOTALS]
RECURSIVE:可选,表示子查询可以是递归的。
TABLE_NAME:临时表的名称,用于在(主和子)查询中引用。
COLUMN_LIST:可选,表示定义在子查询中的临时表的列。如果未指定列,则将根据子查询的结果自动创建列。
SUB_QUERY:是实际的子查询语句,定义临时表的数据和逻辑。
STATEMENT:表示完整的查询语句。
[CUBE | ROLLUP | TOTALS]:表示对
GROUP BY
子句的扩展方式。
严格来说,WITH模型
和WITH语法
是两种东西,但总归是同一个关键字,所以也把它放在这里了。
1. 第一种用法:定义变量
:) WITH pow(2, 2) AS cal
SELECT pow(cal, 2);
┌─pow(cal, 2)─┐
│ 16 │
└─────────────┘
:) WITH 10 AS Start
SELECT number FROM system.numbers
WHERE number > Start
LIMIT 3;
┌─number─┐
│ 11 │
│ 12 │
│ 13 │
└────────┘
:) CREATE TABLE t_test
(
name String
) ENGINE = MergeTree()
ORDER BY name;
:) INSERT INTO t_test VALUES
('zhangsan'),
('lisi'),
('wanglin');
:) WITH 'wanglin' AS w, 'zhangsan' AS z
SELECT * FROM t_test
WHERE name = w OR name = z;
┌─name────┐
│ wanglin │
└─────────┘
2. 第二种用法:调用函数
:) WITH toYear(now()) AS year
SELECT year;
┌─year─┐
│ 2024 │
└──────┘
-- 对比一下原始数据
:) SELECT database, data_uncompressed_bytes
FROM system.columns
ORDER BY data_uncompressed_bytes DESC
LIMIT 10;
┌─database─┬─data_uncompressed_bytes─┐
│ tutorial │ 697963246 │
│ tutorial │ 697885258 │
│ tutorial │ 641999783 │
│ tutorial │ 204160797 │
│ tutorial │ 178036672 │
│ tutorial │ 160192350 │
│ tutorial │ 154038338 │
│ tutorial │ 150759693 │
│ tutorial │ 141982368 │
│ tutorial │ 141982368 │
└──────────┴─────────────────────────┘
-- 得到全部数据库各自所占用空间的大小
:) WITH SUM(data_uncompressed_bytes) AS bytes
SELECT database, formatReadableSize(bytes) AS format
FROM system.columns
GROUP BY database
ORDER BY bytes DESC;
┌─database───────────┬─format────┐
│ tutorial │ 6.73 GiB │
│ system │ 54.37 MiB │
│ default │ 0.00 B │
│ INFORMATION_SCHEMA │ 0.00 B │
│ information_schema │ 0.00 B │
└────────────────────┴───────────┘
3. 第三种用法:定义子查询
WITH
定义的子查询只能返回单个结果而非聚合结果,否则会抛出异常。
而且子查询返回的结果是元组类型。
-- 对比一下原始数据
:) SELECT database, data_uncompressed_bytes
FROM system.columns
ORDER BY data_uncompressed_bytes DESC
LIMIT 10;
┌─database─┬─data_uncompressed_bytes─┐
│ tutorial │ 697963246 │
│ tutorial │ 697885258 │
│ tutorial │ 641999783 │
│ tutorial │ 204160797 │
│ tutorial │ 178036672 │
│ tutorial │ 160192350 │
│ tutorial │ 154038338 │
│ tutorial │ 150759693 │
│ tutorial │ 141982368 │
│ tutorial │ 141982368 │
└──────────┴─────────────────────────┘
-- 得到所有数据库占用总空间大小的百分比
:) WITH (
SELECT SUM(data_uncompressed_bytes) FROM system.columns
) AS total_bytes
SELECT database, SUM(data_uncompressed_bytes) / total_bytes * 100 AS database_disk_usage
FROM system.columns
GROUP BY database
ORDER BY database_disk_usage DESC;
┌─database───────────┬─database_disk_usage─┐
│ tutorial │ 99.14569466462764 │
│ system │ 0.8543053353723684 │
│ default │ 0 │
│ INFORMATION_SCHEMA │ 0 │
│ information_schema │ 0 │
└────────────────────┴─────────────────────┘
也可以在子查询中重复使用WITH
,实现子查询的嵌套。
实现这种嵌套要由内而外
:先写内层,再从内向外扩展。
-- 将数据库占用空间的百分比数值四舍五入
:) WITH (
round(database_disk_usage)
) AS database_disk_usage_v1
SELECT database, database_disk_usage, database_disk_usage_v1
FROM (
WITH (
SELECT SUM(data_uncompressed_bytes) FROM system.columns
) AS total_bytes
SELECT database, SUM(data_uncompressed_bytes) / total_bytes * 100 AS database_disk_usage
FROM system.columns
GROUP BY database
ORDER BY database_disk_usage DESC
);
┌─database───────────┬─database_disk_usage─┬─database_disk_usage_v1─┐
│ tutorial │ 98.51346690839512 │ 99 │
│ system │ 1.486533091604877 │ 1 │
│ default │ 0 │ 0 │
│ INFORMATION_SCHEMA │ 0 │ 0 │
│ information_schema │ 0 │ 0 │
└────────────────────┴─────────────────────┴────────────────────────┘
4. WITH的模式:with X AS( ? )
和with ( ? ) AS X
有如下示例数据。
:) CREATE TABLE order_details
(
skuid UInt32,
price Decimal(10, 2),
createtime Datetime
) ENGINE = MergeTree()
ORDER BY createtime;
:) INSERT INTO order_details VALUES
(1, 199.00, '2024-01-01 01:00:00'),
(2, 240.00, '2024-01-02 02:00:00'),
(3, 320.00, '2024-01-04 03:00:00'),
(4, 249.00, '2024-01-06 04:00:00'),
(5, 1299.00, '2024-01-02 05:00:00'),
(6, 2499.00, '2024-01-01 06:00:00'),
(7, 3000.00, '2024-01-03 07:00:00'),
(8, 52.00, '2024-01-05 08:00:00'),
(9, 18.00, '2024-01-04 09:00:00'),
(10, 1599.00, '2024-01-04 11:00:00'),
(11, 15000.00, '2024-01-02 12:00:00'),
(12, 89.00, '2024-01-06 13:00:00'),
(13, 67.00, '2024-01-07 14:00:00'),
(14, 138.00, '2024-01-03 15:00:00'),
(15, 74.00, '2024-01-03 16:00:00'),
(16, 680.00, '2024-01-01 17:00:00'),
(17, 512.00, '2024-01-02 18:00:00'),
(18, 880.00, '2024-01-07 19:00:00'),
(19, 700.00, '2024-01-06 20:00:00'),
(20, 950.00, '2024-01-06 21:00:00');
SELECT * FROM order_details;
┌─skuid─┬─price─┬──────────createtime─┐
│ 1 │ 199 │ 2024-01-01 01:00:00 │
│ 6 │ 2499 │ 2024-01-01 06:00:00 │
│ 16 │ 680 │ 2024-01-01 17:00:00 │
│ 2 │ 240 │ 2024-01-02 02:00:00 │
│ 5 │ 1299 │ 2024-01-02 05:00:00 │
│ 11 │ 15000 │ 2024-01-02 12:00:00 │
│ 17 │ 512 │ 2024-01-02 18:00:00 │
│ 7 │ 3000 │ 2024-01-03 07:00:00 │
│ 14 │ 138 │ 2024-01-03 15:00:00 │
│ 15 │ 74 │ 2024-01-03 16:00:00 │
│ 3 │ 320 │ 2024-01-04 03:00:00 │
│ 9 │ 18 │ 2024-01-04 09:00:00 │
│ 10 │ 1599 │ 2024-01-04 11:00:00 │
│ 8 │ 52 │ 2024-01-05 08:00:00 │
│ 4 │ 249 │ 2024-01-06 04:00:00 │
│ 12 │ 89 │ 2024-01-06 13:00:00 │
│ 19 │ 700 │ 2024-01-06 20:00:00 │
│ 20 │ 950 │ 2024-01-06 21:00:00 │
│ 13 │ 67 │ 2024-01-07 14:00:00 │
│ 18 │ 880 │ 2024-01-07 19:00:00 │
└───────┴───────┴─────────────────────┘
with X AS( ? )
此时,X
代表结果集。
-- 按每日总销售额的倒序输出结果
:) WITH order_details_v1 AS (
SELECT toDate(createtime) AS selltime, SUM(price) AS total_sell
FROM order_details
GROUP BY selltime
ORDER BY total_sell DESC
)
SELECT * FROM order_details_v1;
┌───selltime─┬─total_sell─┐
│ 2024-01-02 │ 17051 │
│ 2024-01-01 │ 3378 │
│ 2024-01-03 │ 3212 │
│ 2024-01-06 │ 1988 │
│ 2024-01-04 │ 1937 │
│ 2024-01-07 │ 947 │
│ 2024-01-05 │ 52 │
└────────────┴────────────┘
with ( ? ) AS X
此时,X
代表某个值。
-- 查出销售额最高的那一天的销售记录,并按倒序输出
:) WITH (
SELECT toDate(createtime) AS selltime
FROM order_details
GROUP BY selltime
ORDER BY SUM(price) DESC
LIMIT 1
) AS order_details_v1
SELECT * FROM order_details
WHERE toDate(createtime) = order_details_v1
ORDER BY createtime DESC
LIMIT 10;
┌─skuid─┬─price─┬──────────createtime─┐
│ 17 │ 512 │ 2024-01-02 18:00:00 │
│ 11 │ 15000 │ 2024-01-02 12:00:00 │
│ 5 │ 1299 │ 2024-01-02 05:00:00 │
│ 2 │ 240 │ 2024-01-02 02:00:00 │
└───────┴───────┴─────────────────────┘
5. WITH模型
:) CREATE TABLE t_with_moduel
(
id UInt8,
visit UInt8,
province String,
city String,
area String
) ENGINE = MergeTree()
ORDER BY id;
:) INSERT INTO t_with_moduel VALUES
(1, 15, '湖北', '武汉', '武昌'),
(2, 88, '湖北', '襄樊', '樊城'),
(3, 24, '湖南', '长沙', '岳麓'),
(4, 47, '陕西', '西安', '未央'),
(5, 71, '陕西', '咸阳', '咸阳'),
(6, 39, '北京', '北京', '海淀'),
(7, 56, '北京', '北京', '朝阳'),
(8, 113, '云南', '昆明', '盘龙'),
(9, 91, '云南', '大理', '太和街道'),
(10, 62, '海南', '海口', '美兰'),
(11, 53, '海南', '三亚', '天涯'),
(12, 85, '西藏', '拉萨', '城关'),
(13, 75, '广东', '广州', '越秀'),
(14, 59, '重庆', '重庆', '九龙坡'),
(15, 8, '江苏', '南京', '雨花台');
比较下面几条查询结果的差异。
:) SELECT province, city, area, sum(visit)
FROM t_with_moduel
GROUP BY province, city, area;
-- WITH CUBE相当于
-- GROUP BY province, city, area
-- UNION ALL
-- GROUP BY province, city
-- UNION ALL
-- GROUP BY province, area
-- UNION ALL
-- GROUP BY province
-- UNION ALL
-- GROUP BY city, area
-- UNION ALL
-- GROUP BY city
-- UNION ALL
-- GROUP BY area
:) SELECT province, city, area, sum(visit)
FROM t_with_moduel
GROUP BY province, city, area
WITH CUBE;
-- WITH ROLLUP相当于
-- GROUP BY province, city, area
-- UNION ALL
-- GROUP BY province, city
-- UNION ALL
-- GROUP BY province
:) SELECT province, city, area, sum(visit)
FROM t_with_moduel
GROUP BY province, city, area
WITH ROLLUP;
-- WITH TOTALS和GROUP BY province, city, area等价
:) SELECT province, city, area, sum(visit)
FROM t_with_moduel
GROUP BY province, city, area
WITH TOTALS;
JOIN子句
Clickhouse的JOIN
子句由三部分组成:连接精度
+ 连接类型
+ JOIN关键字
。

先创建两张表用于测试。
:) CREATE TABLE join_table_1
(
id UInt8,
name String,
time Datetime
) ENGINE = MergeTree()
ORDER BY id;
:) INSERT INTO join_table_1 VALUES
(1, 'a', '2024-01-01 10:00:00'),
(2, 'b', '2024-01-01 11:00:00'),
(3, 'c', '2024-01-01 12:00:00'),
(4, 'd', '2024-01-01 13:00:00'),
(5, 'e', '2024-01-01 14:00:00');
┌─id─┬─name─┬────────────────time─┐
│ 1 │ a │ 2024-01-01 10:00:00 │
│ 2 │ b │ 2024-01-01 11:00:00 │
│ 3 │ c │ 2024-01-01 12:00:00 │
│ 4 │ d │ 2024-01-01 13:00:00 │
│ 5 │ e │ 2024-01-01 14:00:00 │
└────┴──────┴─────────────────────┘
:) CREATE TABLE join_table_2
(
id UInt8,
score UInt32,
time Datetime
) ENGINE = MergeTree()
ORDER BY id;
:) INSERT INTO join_table_2 VALUES
(1, 100, '2024-01-01 15:00:00'),
(1, 200, '2024-01-01 16:00:00'),
(2, 300, '2024-01-01 17:00:00'),
(2, 400, '2024-01-01 18:00:00'),
(6, 500, '2024-01-01 19:00:00');
┌─id─┬─score─┬────────────────time─┐
│ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 1 │ 200 │ 2024-01-01 16:00:00 │
│ 2 │ 300 │ 2024-01-01 17:00:00 │
│ 2 │ 400 │ 2024-01-01 18:00:00 │
│ 6 │ 500 │ 2024-01-01 19:00:00 │
└────┴───────┴─────────────────────┘
1. 连接精度
连接精度决定了JOIN
子句在连接数据时所使用的策略,目前支持ALL
、ANY
和ASOF
三种类型。
ALL
如果左表内的一行数据在右表中有多行数据与之匹配,则返回右表中 全部的 数据。
:) SELECT
a.id AS aid,
a.name AS aname,
a.time AS atime,
b.id AS bid,
b.score AS score,
b.time AS btime
FROM join_table_1 AS a
ALL INNER JOIN join_table_2 AS b ON a.id = b.id;
--上面的SQL查语句和下面的效果等同
:) SELECT
a.id AS aid,
a.name AS aname,
a.time AS atime,
b.id AS bid,
b.score AS score,
b.time AS btime
FROM join_table_1 AS a, join_table_2 AS b
WHERE a.id = b.id;
┌─aid─┬─aname─┬───────────────atime─┬─bid─┬─score─┬───────────────btime─┐
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 200 │ 2024-01-01 16:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 300 │ 2024-01-01 17:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 400 │ 2024-01-01 18:00:00 │
└─────┴───────┴─────────────────────┴─────┴───────┴─────────────────────┘
这其实就是在MySQL查询中用的比较多的全内连接查询。
ANY
如果左表内的一行数据在右表中有多行数据与之匹配,则返回右表中 仅第一行的 数据。
:) SELECT
a.id AS aid,
a.name AS aname,
a.time AS atime,
b.id AS bid,
b.score AS score,
b.time AS btime
FROM join_table_1 AS a
ANY INNER JOIN join_table_2 AS b ON a.id = b.id;
┌─aid─┬─aname─┬───────────────atime─┬─bid─┬─score─┬───────────────btime─┐
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 300 │ 2024-01-01 17:00:00 │
└─────┴───────┴─────────────────────┴─────┴───────┴─────────────────────┘
ASOF
ASOF是一种模糊查询,它允许在连接键之后再定义一个模糊连接的匹配条件(此条件要求不等式,大于或者小于)。
:) SELECT
a.id AS aid,
a.name AS aname,
a.time AS atime,
b.id AS bid,
b.score AS score,
b.time AS btime
FROM join_table_1 AS a
ASOF INNER JOIN join_table_2 AS b ON a.id = b.id AND a.time >= b.time;
┌─aid─┬─aname─┬───────────────atime─┬─bid─┬─score─┬───────────────btime─┐
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 400 │ 2024-01-01 18:00:00 │
└─────┴───────┴─────────────────────┴─────┴───────┴─────────────────────┘
2. 连接类型
连接类型和传统RDBMS种的查询操作保持一致。
INNER
:) SELECT
a.id AS aid,
a.name AS aname,
a.time AS atime,
b.id AS bid,
b.score AS score,
b.time AS btime
FROM join_table_1 AS a
INNER JOIN join_table_2 AS b ON a.id = b.id;
┌─aid─┬─aname─┬───────────────atime─┬─bid─┬─score─┬───────────────btime─┐
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 200 │ 2024-01-01 16:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 300 │ 2024-01-01 17:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 400 │ 2024-01-01 18:00:00 │
└─────┴───────┴─────────────────────┴─────┴───────┴─────────────────────┘
LEFT OUTER JOIN
:) SELECT
a.id AS aid,
a.name AS aname,
a.time AS atime,
b.id AS bid,
b.score AS score,
b.time AS btime
FROM join_table_1 AS a
LEFT OUTER JOIN join_table_2 AS b ON a.id = b.id;
┌─aid─┬─aname─┬───────────────atime─┬─bid─┬─score─┬───────────────btime─┐
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 200 │ 2024-01-01 16:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 300 │ 2024-01-01 17:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 400 │ 2024-01-01 18:00:00 │
│ 3 │ c │ 2024-03-01 12:00:00 │ 0 │ 0 │ 1970-01-01 08:00:00 │
│ 4 │ d │ 2021-04-01 13:00:00 │ 0 │ 0 │ 1970-01-01 08:00:00 │
│ 5 │ e │ 2024-05-01 14:00:00 │ 0 │ 0 │ 1970-01-01 08:00:00 │
└─────┴───────┴─────────────────────┴─────┴───────┴─────────────────────┘
RIGHT OUTER JOIN
:) SELECT
a.id AS aid,
a.name AS aname,
a.time AS atime,
b.id AS bid,
b.score AS score,
b.time AS btime
FROM join_table_1 AS a
RIGHT OUTER JOIN join_table_2 AS b ON a.id = b.id;
┌─aid─┬─aname─┬───────────────atime─┬─bid─┬─score─┬───────────────btime─┐
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 200 │ 2024-01-01 16:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 300 │ 2024-01-01 17:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 400 │ 2024-01-01 18:00:00 │
└─────┴───────┴─────────────────────┴─────┴───────┴─────────────────────┘
┌─aid─┬─aname─┬───────────────atime─┬─bid─┬─score─┬───────────────btime─┐
│ 0 │ │ 1970-01-01 08:00:00 │ 6 │ 500 │ 2024-01-01 19:00:00 │
└─────┴───────┴─────────────────────┴─────┴───────┴─────────────────────┘
FULL JOIN
:) SELECT
a.id AS aid,
a.name AS aname,
a.time AS atime,
b.id AS bid,
b.score AS score,
b.time AS btime
FROM join_table_1 AS a
FULL OUTER JOIN join_table_2 AS b ON a.id = b.id;
┌─aid─┬─aname─┬───────────────atime─┬─bid─┬─score─┬───────────────btime─┐
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 200 │ 2024-01-01 16:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 300 │ 2024-01-01 17:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 400 │ 2024-01-01 18:00:00 │
│ 3 │ c │ 2024-03-01 12:00:00 │ 0 │ 0 │ 1970-01-01 08:00:00 │
│ 4 │ d │ 2021-04-01 13:00:00 │ 0 │ 0 │ 1970-01-01 08:00:00 │
│ 5 │ e │ 2024-05-01 14:00:00 │ 0 │ 0 │ 1970-01-01 08:00:00 │
└─────┴───────┴─────────────────────┴─────┴───────┴─────────────────────┘
┌─aid─┬─aname─┬───────────────atime─┬─bid─┬─score─┬───────────────btime─┐
│ 0 │ │ 1970-01-01 08:00:00 │ 6 │ 500 │ 2024-01-01 19:00:00 │
└─────┴───────┴─────────────────────┴─────┴───────┴─────────────────────┘
CROSS
:) SELECT
a.id AS aid,
a.name AS aname,
a.time AS atime,
b.id AS bid,
b.score AS score,
b.time AS btime
FROM join_table_1 AS a
CROSS JOIN join_table_2 AS b;
┌─aid─┬─aname─┬───────────────atime─┬─bid─┬─score─┬───────────────btime─┐
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 1 │ a │ 2024-01-01 10:00:00 │ 1 │ 200 │ 2024-01-01 16:00:00 │
│ 1 │ a │ 2024-01-01 10:00:00 │ 2 │ 300 │ 2024-01-01 17:00:00 │
│ 1 │ a │ 2024-01-01 10:00:00 │ 2 │ 400 │ 2024-01-01 18:00:00 │
│ 1 │ a │ 2024-01-01 10:00:00 │ 6 │ 500 │ 2024-01-01 19:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 1 │ 200 │ 2024-01-01 16:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 300 │ 2024-01-01 17:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 2 │ 400 │ 2024-01-01 18:00:00 │
│ 2 │ b │ 2024-02-01 11:00:00 │ 6 │ 500 │ 2024-01-01 19:00:00 │
│ 3 │ c │ 2024-03-01 12:00:00 │ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 3 │ c │ 2024-03-01 12:00:00 │ 1 │ 200 │ 2024-01-01 16:00:00 │
│ 3 │ c │ 2024-03-01 12:00:00 │ 2 │ 300 │ 2024-01-01 17:00:00 │
│ 3 │ c │ 2024-03-01 12:00:00 │ 2 │ 400 │ 2024-01-01 18:00:00 │
│ 3 │ c │ 2024-03-01 12:00:00 │ 6 │ 500 │ 2024-01-01 19:00:00 │
│ 4 │ d │ 2021-04-01 13:00:00 │ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 4 │ d │ 2021-04-01 13:00:00 │ 1 │ 200 │ 2024-01-01 16:00:00 │
│ 4 │ d │ 2021-04-01 13:00:00 │ 2 │ 300 │ 2024-01-01 17:00:00 │
│ 4 │ d │ 2021-04-01 13:00:00 │ 2 │ 400 │ 2024-01-01 18:00:00 │
│ 4 │ d │ 2021-04-01 13:00:00 │ 6 │ 500 │ 2024-01-01 19:00:00 │
│ 5 │ e │ 2024-05-01 14:00:00 │ 1 │ 100 │ 2024-01-01 15:00:00 │
│ 5 │ e │ 2024-05-01 14:00:00 │ 1 │ 200 │ 2024-01-01 16:00:00 │
│ 5 │ e │ 2024-05-01 14:00:00 │ 2 │ 300 │ 2024-01-01 17:00:00 │
│ 5 │ e │ 2024-05-01 14:00:00 │ 2 │ 400 │ 2024-01-01 18:00:00 │
│ 5 │ e │ 2024-05-01 14:00:00 │ 6 │ 500 │ 2024-01-01 19:00:00 │
└─────┴───────┴─────────────────────┴─────┴───────┴─────────────────────┘
3. 注意事项
JOIN
操作时一定要把数据量小的表放在右边。因为无论哪种JOIN
,右表中的每一条记录都会到左表中查找该记录是否存在,所以右表必须是小表。Clickhouse作为数据仓库层或数据挖掘层服务,尽量不要在其中执行
JOIN
操作。JOIN
操作也没有缓存机制,每执行一次JOIN
操作都会生成一次全新的查询计划,哪怕是执行完全相同的SQL语句也是如此。
ARRAY JOIN子句
ARRAY JOIN
子句允许在数据表的内部与数组ARRAY
类型的字段进行JOIN
操作,从而将数组字段展开为多个行,以便在查询结果中分别处理每个数组元素——这就是它为什么叫ARRAY JOIN
的原因。
ARRAY JOIN
支持的操作包括。
ARRAY JOIN
:也可以写成INNER ARRAY JOIN
,它会排除包含空数组的结果。LEFT ARRAY JOIN
:会把空数组也显示在结果之中,而空数组的值为数组元素类型的默认值(0、空字符串或NULL)。
创建一张带有数组类型的表,并插入一些测试数据。
:) CREATE TABLE array_join_table
(
id UInt8,
country String,
provinces Array(String),
ranks Array(UInt32)
) ENGINE = MergeTree()
ORDER BY id;
:) INSERT INTO array_join_table VALUES
(1, 'China', ['Hubei', 'Shanxi', 'Guangdong'], [1, 2, 1]),
(2, 'America', ['NewYork', 'Alaska', 'Arizona'], [2, 1, 3]),
(3, 'Franch', ['Somme', 'Paris'], [10, 20]),
(4, 'English', [], [10]),
(4, 'Russia', [], [15]);
┌─id─┬─country─┬─provinces──────────────────────┬─ranks───┐
│ 1 │ China │ ['Hubei','Shanxi','Guangdong'] │ [1,2,1] │
│ 2 │ America │ ['NewYork','Alaska','Arizona'] │ [2,1,3] │
│ 3 │ Franch │ ['Somme','Paris'] │ [10,20] │
│ 4 │ English │ [] │ [10] │
│ 4 │ Russia │ [] │ [15] │
└────┴─────────┴────────────────────────────────┴─────────┘
1. 简单表结构的ARRAY JOIN
使用INNER ARRAY JOIN
子句查询和展开数组数据(ARRAY JOIN
默认就是INNER ARRAY JOIN
)。
:) SELECT id, country, province
FROM array_join_table
ARRAY JOIN provinces AS province;
┌─id─┬─country─┬─province──┐
│ 1 │ China │ Hubei │
│ 1 │ China │ Shanxi │
│ 1 │ China │ Guangdong │
│ 2 │ America │ NewYork │
│ 2 │ America │ Alaska │
│ 2 │ America │ Arizona │
│ 3 │ Franch │ Somme │
│ 3 │ Franch │ Paris │
└────┴─────────┴───────────┘
每个数组元素都被分别作为一行返回,其中id
和country
的值与原始表中的值相同,而province
和city
列的值则是数组元素的值。
ARRAY JOIN
不会把空数组显示在结果中,而LEFT ARRAY JOIN
会。
:) SELECT id, country, provinces, province
FROM array_join_table
LEFT ARRAY JOIN provinces AS province;
┌─id─┬─country─┬─provinces──────────────────────┬─province──┐
│ 1 │ China │ ['Hubei','Shanxi','Guangdong'] │ Hubei │
│ 1 │ China │ ['Hubei','Shanxi','Guangdong'] │ Shanxi │
│ 1 │ China │ ['Hubei','Shanxi','Guangdong'] │ Guangdong │
│ 2 │ America │ ['NewYork','Alaska','Arizona'] │ NewYork │
│ 2 │ America │ ['NewYork','Alaska','Arizona'] │ Alaska │
│ 2 │ America │ ['NewYork','Alaska','Arizona'] │ Arizona │
│ 3 │ Franch │ ['Somme','Paris'] │ Somme │
│ 3 │ Franch │ ['Somme','Paris'] │ Paris │
│ 4 │ English │ [] │ │
│ 4 │ Russia │ [] │ │
└────┴─────────┴────────────────────────────────┴───────────┘
把没有数据的English
和Russia
也给查了出来。
即使同时对多个数组字段执行ARRAY JOIN
操作,也不会产生笛卡尔积,而是按行合并。
:) SELECT
id, country, provinces,
flat_province, ranks, flat_rank,
arrayMap(x -> x * x, ranks) AS mapped,
flat_mapped
FROM array_join_table
LEFT ARRAY JOIN provinces AS flat_province,
ranks AS flat_rank,
mapped AS flat_mapped;
┌─id─┬─country─┬─provinces──────────────────────┬─flat_province─┬─ranks───┬─flat_rank─┬─mapped────┬─flat_mapped─┐
│ 1 │ China │ ['Hubei','Shanxi','Guangdong'] │ Hubei │ [1,2,1] │ 1 │ [1,4,1] │ 1 │
│ 1 │ China │ ['Hubei','Shanxi','Guangdong'] │ Shanxi │ [1,2,1] │ 2 │ [1,4,1] │ 4 │
│ 1 │ China │ ['Hubei','Shanxi','Guangdong'] │ Guangdong │ [1,2,1] │ 1 │ [1,4,1] │ 1 │
│ 2 │ America │ ['NewYork','Alaska','Arizona'] │ NewYork │ [2,1,3] │ 2 │ [4,1,9] │ 4 │
│ 2 │ America │ ['NewYork','Alaska','Arizona'] │ Alaska │ [2,1,3] │ 1 │ [4,1,9] │ 1 │
│ 2 │ America │ ['NewYork','Alaska','Arizona'] │ Arizona │ [2,1,3] │ 3 │ [4,1,9] │ 9 │
│ 3 │ Franch │ ['Somme','Paris'] │ Somme │ [10,20] │ 10 │ [100,400] │ 100 │
│ 3 │ Franch │ ['Somme','Paris'] │ Paris │ [10,20] │ 20 │ [100,400] │ 400 │
│ 4 │ English │ [] │ │ [10] │ 10 │ [100] │ 100 │
│ 4 │ Russia │ [] │ │ [15] │ 15 │ [225] │ 225 │
└────┴─────────┴────────────────────────────────┴───────────────┴─────────┴───────────┴───────────┴─────────────┘
provinces
、ranks
和mapped
的顺序不影响结果。
可以很清楚地看到,provinces
、ranks
和mapped
并没有产生庞大的笛卡尔积,而是做了合并处理。
而且给数组指定了别名:provinces
的别名是flat_province
,ranks
的别名是flat_rank
,而mapped
的别名是flat_mapped
。
数组元素可以通过别名访问,但数组本身则通过原始名称访问,这样既便于阅读,也便于处理。
2. 与外部数据的ARRAY JOIN
可以使用别名与外部数组执行ARRAY JOIN
。
:) SELECT
country, provinces, outer_join
FROM array_join_table
ARRAY JOIN [1, 2, 3] AS outer_join;
┌─country─┬─provinces──────────────────────┬─outer_join─┐
│ China │ ['Hubei','Shanxi','Guangdong'] │ 1 │
│ China │ ['Hubei','Shanxi','Guangdong'] │ 2 │
│ China │ ['Hubei','Shanxi','Guangdong'] │ 3 │
│ America │ ['NewYork','Alaska','Arizona'] │ 1 │
│ America │ ['NewYork','Alaska','Arizona'] │ 2 │
│ America │ ['NewYork','Alaska','Arizona'] │ 3 │
│ Franch │ ['Somme','Paris'] │ 1 │
│ Franch │ ['Somme','Paris'] │ 2 │
│ Franch │ ['Somme','Paris'] │ 3 │
│ English │ [] │ 1 │
│ English │ [] │ 2 │
│ English │ [] │ 3 │
│ Russia │ [] │ 1 │
│ Russia │ [] │ 2 │
│ Russia │ [] │ 3 │
└─────────┴────────────────────────────────┴────────────┘
:) SELECT
country, provinces, outer_join
FROM array_join_table
LEFT ARRAY JOIN [1, 2, 3] AS outer_join;
┌─country─┬─provinces──────────────────────┬─outer_join─┐
│ China │ ['Hubei','Shanxi','Guangdong'] │ 1 │
│ China │ ['Hubei','Shanxi','Guangdong'] │ 2 │
│ China │ ['Hubei','Shanxi','Guangdong'] │ 3 │
│ America │ ['NewYork','Alaska','Arizona'] │ 1 │
│ America │ ['NewYork','Alaska','Arizona'] │ 2 │
│ America │ ['NewYork','Alaska','Arizona'] │ 3 │
│ Franch │ ['Somme','Paris'] │ 1 │
│ Franch │ ['Somme','Paris'] │ 2 │
│ Franch │ ['Somme','Paris'] │ 3 │
│ English │ [] │ 1 │
│ English │ [] │ 2 │
│ English │ [] │ 3 │
│ Russia │ [] │ 1 │
│ Russia │ [] │ 2 │
│ Russia │ [] │ 3 │
└─────────┴────────────────────────────────┴────────────┘
不管是ARRAY JOIN
还是LEFT ARRAY JOIN
,在和外部数组执行计算的时候,都会显示空数组。
3. 嵌套数据结构的ARRAY JOIN
ARRAY JOIN
也可以用于嵌套数据结构。
:) CREATE TABLE array_join_nested
(
id UInt32,
name String,
subject Nested (
course String,
score UInt32
)
) ENGINE = MergeTree()
ORDER BY (id, name);
:) INSERT INTO array_join_nested VALUES
(1, 'a', ['语文', '数学', '英语'], [95, 97, 99]),
(2, 'b', ['语文', '数学', '英语'], [94, 96, 98]),
(3, 'c', [], []);
┌─id─┬─name─┬─subject.course─────────┬─subject.score─┐
│ 1 │ a │ ['语文','数学','英语'] │ [95,97,99] │
│ 2 │ b │ ['语文','数学','英语'] │ [94,96,98] │
│ 3 │ c │ [] │ [] │
└────┴──────┴────────────────────────┴───────────────┘
在嵌套的数据类型中执行ARRAY JOIN
或者LEFT ARRAY JOIN
。
:) SELECT name, subject.course, subject.score
FROM array_join_nested
LEFT ARRAY JOIN subject.course;
┌─name─┬─subject.course─┬─subject.score─┐
│ a │ 语文 │ 95 │
│ a │ 数学 │ 97 │
│ a │ 英语 │ 99 │
│ b │ 语文 │ 94 │
│ b │ 数学 │ 96 │
│ b │ 英语 │ 98 │
│ c │ │ 0 │
└──────┴────────────────┴───────────────┘
换成ARRAY JOIN
无非就是把空数组去掉了而已。
这里没有继续深入了解嵌套数据类型中继续有数组字段时ARRAY JOIN
的情况,例如下面的表结构。
:) CREATE TABLE array_join_nested_array
(
id UInt32,
name String,
subject Nested (
course String,
score Array(UInt32)
)
) ENGINE = MergeTree()
ORDER BY (id, name);
实际生产环境中这种情况肯定会出现,它和简单表结构的ARRAY JOIN
其实没什么本质上的区别,照着套路来就行。
4. 实际应用场景
有如下的部门营业数据。
:) CREATE TABLE array_join_dept_report
(
deptname String,
budgetname String,
types String,
money Decimal(16, 2),
createtime Datetime
) ENGINE = MergeTree()
ORDER BY deptname;
:) INSERT INTO array_join_dept_report VALUES
('部门1', '主营收入', 'type1', 3600.00, '2024-01-01 09:00:00'),
('部门1', '所得税', 'type1', 300.00, '2024-01-01 09:00:00'),
('部门1', '成本', 'type1', 2300.00, '2024-01-01 09:00:00'),
('部门1', '折旧和摊销', 'type1', 450.00, '2024-01-01 09:00:00'),
('部门1', '发货套数', 'type1', 99.00, '2024-01-01 09:00:00'),
('部门1', '主营收入', 'type2', 50500.00, '2024-01-01 09:00:00'),
('部门1', '所得税', 'type2', 12000.00, '2024-01-01 09:00:00'),
('部门1', '成本', 'type2', 9000.00, '2024-01-01 09:00:00'),
('部门1', '折旧和摊销', 'type2', 4500.00, '2024-01-01 09:00:00'),
('部门1', '发货套数', 'type2', 900.00, '2024-01-01 09:00:00'),
('部门1', '主营收入', '公共', 15000000, '2024-01-01 09:00:00'),
('部门2', '主营收入', 'type1', 13000.00, '2024-01-01 09:00:00'),
('部门2', '所得税', 'type1', 2000.00, '2024-01-01 09:00:00'),
('部门2', '成本', 'type1', 1000.00, '2024-01-01 09:00:00'),
('部门2', '折旧和摊销', 'type1', 1600.00, '2024-01-01 09:00:00'),
('部门2', '发货套数', 'type1', 700.00, '2024-01-01 09:00:00'),
('部门2', '主营收入', 'type2', 9800.00, '2024-01-01 09:00:00'),
('部门2', '所得税', 'type2', 500.00, '2024-01-01 09:00:00'),
('部门2', '成本', 'type2', 300.00, '2024-01-01 09:00:00'),
('部门2', '折旧和摊销', 'type2', 680.00, '2024-01-01 09:00:00'),
('部门2', '发货套数', 'type2', 200.00, '2024-01-01 09:00:00'),
('部门2', '主营收入', '公共', 20000000, '2024-01-01 09:00:00');
┌─deptname─┬─budgetname─┬─types─┬────money─┬──────────createtime─┐
│ 部门1 │ 主营收入 │ type1 │ 3600 │ 2024-01-01 09:00:00 │
│ 部门1 │ 所得税 │ type1 │ 300 │ 2024-01-01 09:00:00 │
│ 部门1 │ 成本 │ type1 │ 2300 │ 2024-01-01 09:00:00 │
│ 部门1 │ 折旧和摊销 │ type1 │ 450 │ 2024-01-01 09:00:00 │
│ 部门1 │ 发货套数 │ type1 │ 99 │ 2024-01-01 09:00:00 │
│ 部门1 │ 主营收入 │ type2 │ 50500 │ 2024-01-01 09:00:00 │
│ 部门1 │ 所得税 │ type2 │ 12000 │ 2024-01-01 09:00:00 │
│ 部门1 │ 成本 │ type2 │ 9000 │ 2024-01-01 09:00:00 │
│ 部门1 │ 折旧和摊销 │ type2 │ 4500 │ 2024-01-01 09:00:00 │
│ 部门1 │ 发货套数 │ type2 │ 900 │ 2024-01-01 09:00:00 │
│ 部门1 │ 主营收入 │ 公共 │ 15000000 │ 2024-01-01 09:00:00 │
│ 部门2 │ 主营收入 │ type1 │ 13000 │ 2024-01-01 09:00:00 │
│ 部门2 │ 所得税 │ type1 │ 2000 │ 2024-01-01 09:00:00 │
│ 部门2 │ 成本 │ type1 │ 1000 │ 2024-01-01 09:00:00 │
│ 部门2 │ 折旧和摊销 │ type1 │ 1600 │ 2024-01-01 09:00:00 │
│ 部门2 │ 发货套数 │ type1 │ 700 │ 2024-01-01 09:00:00 │
│ 部门2 │ 主营收入 │ type2 │ 9800 │ 2024-01-01 09:00:00 │
│ 部门2 │ 所得税 │ type2 │ 500 │ 2024-01-01 09:00:00 │
│ 部门2 │ 成本 │ type2 │ 300 │ 2024-01-01 09:00:00 │
│ 部门2 │ 折旧和摊销 │ type2 │ 680 │ 2024-01-01 09:00:00 │
│ 部门2 │ 发货套数 │ type2 │ 200 │ 2024-01-01 09:00:00 │
│ 部门2 │ 主营收入 │ 公共 │ 20000000 │ 2024-01-01 09:00:00 │
└──────────┴────────────┴───────┴──────────┴─────────────────────┘
现在的需求是:求出各个部门中各个type(不包括公共)的营业收入,所得税,成本,折旧和摊销,发货套数,产值(收入 × 1.13),毛利(收入 - 成本),单套收入(收入/发货套数),利润总额(收入 - 成本 - 折旧和摊销 - 所得税),净利润(利润总额 - 所得税),要求输出的数据中带有指标名称。
-- 注意 '' 和 `` 写法的区别,否则执行报错
:) SELECT deptname, types, arr, arr.1 AS budgetname, arr.2 AS money
FROM
(
SELECT deptname, types, [
('主营收入', sumIf(ifNull(money, 0), budgetname = '主营收入') AS `营业收入`),
('所得税', sumIf(money, budgetname = '所得税') AS `所得税`),
('成本', sumIf(money, budgetname = '成本') AS `成本`),
('折旧和摊销', sumIf(money, budgetname = '折旧和摊销') AS `折旧和摊销`),
('发货套数', sumIf(money, budgetname = '发货套数') AS `发货套数`),
('产值', toDecimal64(`营业收入` * 1.13, 2) AS `产值`),
('毛利', toDecimal64(`营业收入` - `成本`, 2) AS `毛利`),
('单套收入', toDecimal64(if(`发货套数` != 0, `营业收入` / `发货套数`, 0), 2) AS `单套收入`),
('利润总额', toDecimal64(`营业收入` - `成本` - `折旧和摊销` - `所得税`, 2) AS `利润总额`),
('净利润', toDecimal64(`利润总额` - `所得税`, 2) AS `净利润`)
] AS arr
FROM array_join_dept_report
WHERE toYear(createtime) = 2024 AND types != '公共'
GROUP BY deptname, types
) ARRAY JOIN arr;
┌─deptname─┬─types─┬─arr─────────────────┬─budgetname─┬────money─┐
│ 部门1 │ type2 │ ('主营收入',50500) │ 主营收入 │ 50500 │
│ 部门1 │ type2 │ ('所得税',12000) │ 所得税 │ 12000 │
│ 部门1 │ type2 │ ('成本',9000) │ 成本 │ 9000 │
│ 部门1 │ type2 │ ('折旧和摊销',4500) │ 折旧和摊销 │ 4500 │
│ 部门1 │ type2 │ ('发货套数',900) │ 发货套数 │ 900 │
│ 部门1 │ type2 │ ('产值',57064.99) │ 产值 │ 57064.99 │
│ 部门1 │ type2 │ ('毛利',41500) │ 毛利 │ 41500 │
│ 部门1 │ type2 │ ('单套收入',56.11) │ 单套收入 │ 56.11 │
│ 部门1 │ type2 │ ('利润总额',25000) │ 利润总额 │ 25000 │
│ 部门1 │ type2 │ ('净利润',13000) │ 净利润 │ 13000 │
│ 部门2 │ type2 │ ('主营收入',9800) │ 主营收入 │ 9800 │
│ 部门2 │ type2 │ ('所得税',500) │ 所得税 │ 500 │
│ 部门2 │ type2 │ ('成本',300) │ 成本 │ 300 │
│ 部门2 │ type2 │ ('折旧和摊销',680) │ 折旧和摊销 │ 680 │
│ 部门2 │ type2 │ ('发货套数',200) │ 发货套数 │ 200 │
│ 部门2 │ type2 │ ('产值',11073.99) │ 产值 │ 11073.99 │
│ 部门2 │ type2 │ ('毛利',9500) │ 毛利 │ 9500 │
│ 部门2 │ type2 │ ('单套收入',49) │ 单套收入 │ 49 │
│ 部门2 │ type2 │ ('利润总额',8320) │ 利润总额 │ 8320 │
│ 部门2 │ type2 │ ('净利润',7820) │ 净利润 │ 7820 │
│ 部门1 │ type1 │ ('主营收入',3600) │ 主营收入 │ 3600 │
│ 部门1 │ type1 │ ('所得税',300) │ 所得税 │ 300 │
│ 部门1 │ type1 │ ('成本',2300) │ 成本 │ 2300 │
│ 部门1 │ type1 │ ('折旧和摊销',450) │ 折旧和摊销 │ 450 │
│ 部门1 │ type1 │ ('发货套数',99) │ 发货套数 │ 99 │
│ 部门1 │ type1 │ ('产值',4067.99) │ 产值 │ 4067.99 │
│ 部门1 │ type1 │ ('毛利',1300) │ 毛利 │ 1300 │
│ 部门1 │ type1 │ ('单套收入',36.36) │ 单套收入 │ 36.36 │
│ 部门1 │ type1 │ ('利润总额',550) │ 利润总额 │ 550 │
│ 部门1 │ type1 │ ('净利润',250) │ 净利润 │ 250 │
│ 部门2 │ type1 │ ('主营收入',13000) │ 主营收入 │ 13000 │
│ 部门2 │ type1 │ ('所得税',2000) │ 所得税 │ 2000 │
│ 部门2 │ type1 │ ('成本',1000) │ 成本 │ 1000 │
│ 部门2 │ type1 │ ('折旧和摊销',1600) │ 折旧和摊销 │ 1600 │
│ 部门2 │ type1 │ ('发货套数',700) │ 发货套数 │ 700 │
│ 部门2 │ type1 │ ('产值',14689.99) │ 产值 │ 14689.99 │
│ 部门2 │ type1 │ ('毛利',12000) │ 毛利 │ 12000 │
│ 部门2 │ type1 │ ('单套收入',18.57) │ 单套收入 │ 18.57 │
│ 部门2 │ type1 │ ('利润总额',8400) │ 利润总额 │ 8400 │
│ 部门2 │ type1 │ ('净利润',6400) │ 净利润 │ 6400 │
└──────────┴───────┴─────────────────────┴────────────┴──────────┘
通过ARRAY JOIN
将子查询中造出的arr
二维数组展开为一维数组,然后budgetname
和money
列数据再从一维数组中读取。
这里面也用到了另外几个函数:sumIf()
、ifNull()
、toDecimal64()
和toYear()
。
现在如果还想看:每个dept的各个type主营收入占所有type主营收入的百分比(这需要求有点绕),该怎么实现?
:) SELECT deptname, types, `主营收入`, `全部收入`,
if(`全部收入` != 0, `主营收入` / `全部收入`, 0) AS `收入占比`
FROM
(
SELECT DISTINCT deptname, types,
sum(toDecimal64(money, 9)) OVER(PARTITION BY deptname, types) AS `主营收入`,
sum(toDecimal64(money, 9)) OVER(PARTITION BY deptname) AS `全部收入`
FROM array_join_dept_report
WHERE toYear(createtime)= 2024 AND types != '公共' AND budgetname = '主营收入'
);
┌─deptname─┬─types─┬─主营收入─┬─全部收入──┬────收入占比─┐
│ 部门1 │ type1 │ 3600 │ 54100 │ 0.066543438 │
│ 部门1 │ type2 │ 50500 │ 54100 │ 0.933456561 │
│ 部门2 │ type1 │ 13000 │ 22800 │ 0.570175438 │
│ 部门2 │ type2 │ 9800 │ 22800 │ 0.429824561 │
└──────────┴───────┴──────────┴──────────┴─────────────┘
这里也用到了sum()
、OVER()
和PARTITION BY
的组合,以实现对dept和types的分区求和。
LIMIT BY 和 OFFSET FETCH
这两个理解起来相对上面的简单,但却不容易记住。
1. LIMIT BY
LIMIT BY
和LIMIT
是不同的。
LIMIT n BY expr
作用于字段expr
。LIMIT n
是作用于整个结果集。
它们可以同时使用。
LIMIT BY
可以满足分组求取前TopN的需求。
:) CREATE TABLE limit_by_table
(
userid UInt64,
orderid UInt64,
amount Decimal(10, 2)
) ENGINE = MergeTree()
ORDER BY userid;
:) INSERT INTO limit_by_table VALUES
(1, 1, 99.00),
(1, 2, 58.00),
(1, 3, 249.00),
(2, 1, 37.00),
(2, 2, 16.00),
(3, 1, 599.00),
(3, 2, 70.00),
(3, 3, 103.00);
┌─userid─┬─orderid─┬─amount─┐
│ 1 │ 1 │ 99 │
│ 1 │ 2 │ 58 │
│ 1 │ 3 │ 249 │
│ 2 │ 1 │ 37 │
│ 2 │ 2 │ 16 │
│ 3 │ 1 │ 599 │
│ 3 │ 2 │ 70 │
│ 3 │ 3 │ 103 │
└────────┴─────────┴────────┘
按订单号分组,并且每个订单号只取前2条数据。
:) SELECT * FROM limit_by_table
ORDER BY orderid
LIMIT 2 BY orderid
┌─userid─┬─orderid─┬─amount─┐
│ 1 │ 1 │ 99 │
│ 2 │ 1 │ 37 │
│ 1 │ 2 │ 58 │
│ 2 │ 2 │ 16 │
│ 1 │ 3 │ 249 │
│ 3 │ 3 │ 103 │
└────────┴─────────┴────────┘
按照orderid分组,每个分组只取了前2条数据,如果用LIMIT
,就是整个结果集只取2条。
如果把LIMIT BY
和LIMIT
结合起来用,就是这样的效果。
:) SELECT * FROM limit_by_table
ORDER BY orderid
LIMIT 1 BY orderid
LIMIT 2
┌─userid─┬─orderid─┬─amount─┐
│ 1 │ 1 │ 99 │
│ 1 │ 2 │ 58 │
└────────┴─────────┴────────┘
每个分组里面只取第一条,总共只取两个分组。
2. OFFSET FETCH
它的语法如下。
OFFSET offset_row_count {ROW | ROWS}] [FETCH {FIRST | NEXT} fetch_row_count {ROW | ROWS} {ONLY | WITH TIES}]
OFFSET
:指定在结果集中返回数据之前要跳过的行数。FETCH
:指定结果中包含的最大行数。ONLY
:返回在OFFSET
之后的行。WITH TIES
:保留与结果集中最后一行的ORDER BY
字段值相同的行。
至于{ROW | ROWS}
和{FIRST | NEXT}
当中取ROW
还是ROWS
,取FIRST
还是NEXT
都不影响结果。
:) CREATE TABLE offset_fetch_table
(
a UInt64,
b UInt64,
) ENGINE = MergeTree()
ORDER BY a;
:) INSERT INTO offset_fetch_table VALUES
(1, 1),
(1, 2),
(1, 3),
(1, 4),
(1, 5),
(1, 6),
(2, 7),
(2, 8),
(2, 9),
(2, 10),
(2, 11),
(2, 12),
(2, 13),
(1, 14),
(1, 15);
┌─a─┬──b─┐
│ 1 │ 1 │
│ 1 │ 2 │
│ 1 │ 3 │
│ 1 │ 4 │
│ 1 │ 5 │
│ 1 │ 6 │
│ 1 │ 7 │
│ 1 │ 8 │
│ 1 │ 9 │
│ 1 │ 10 │
│ 1 │ 11 │
│ 1 │ 12 │
│ 1 │ 13 │
│ 1 │ 14 │
│ 1 │ 15 │
└───┴────┘
如果只想取从第4行开始的3行,那么下面两条SQL查询语句效果是一样的。
:) SELECT * FROM offset_fetch_table
ORDER BY a
OFFSET 3 ROW FETCH FIRST 3 ROW ONLY;
:) SELECT * FROM offset_fetch_table
ORDER BY a
LIMIT 3
OFFSET 3;
┌─a─┬─b─┐
│ 1 │ 4 │
│ 1 │ 5 │
│ 1 │ 6 │
└───┴───┘
试一下WITH TIES
的效果。
:) SELECT * FROM offset_fetch_table
ORDER BY a
OFFSET 3 ROW FETCH FIRST 3 ROW WITH TIES;
┌─a─┬──b─┐
│ 1 │ 4 │
│ 1 │ 5 │
│ 1 │ 6 │
│ 1 │ 14 │
│ 1 │ 15 │
└───┴────┘
因为最后一行的排序字段a
的值为1
,因此通过WITH TIES
,就返回了剩下的与a
的值相同的行。
EXPLAIN
用于显示SQL执行计划,它是进行系统调优的重要参考依据。在这一点上,大部分数据库都是一致的。
它的语法虽然看起来好像很复杂,其实非常简单,因为真正有用的只有一行。
-- EXPLAIN语法
EXPLAIN [AST | SYNTAX | QUERY TREE | PLAN | PIPELINE | ESTIMATE | TABLE OVERRIDE] [setting = value, ...]
可以在SQL语句之前仅仅指定EXPLAIN
而什么都不带,以之前的ARRAY JOIN
为例来说明,在SQL语句之前增加EXPLAIN
后可以看到的东西。
:) EXPLAIN SELECT
id, country, provinces,
flat_province, ranks, flat_rank,
arrayMap(x -> x * x, ranks) AS mapped,
flat_mapped
FROM array_join_table
LEFT ARRAY JOIN
provinces AS flat_province,
ranks AS flat_rank,
mapped AS flat_mapped;
┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Expression ((Project names + Projection [split])) │
│ ArrayJoin (ARRAY JOIN) │
│ Expression ((Projection + (DROP unused columns before ARRAY JOIN + (ARRAY JOIN actions + Change column names to column identifiers)))) │
│ ReadFromMergeTree (default.array_join_table) │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
上面只显示了最基本的SQL语句执行情况,如果想看更多的,可以加上一下选项。
AST
:显示抽象语法树。SYNTAX
:显示优化后的SQL查询语句。QUERY TREE
:显示查询的执行计划树。PLAN
:EXPLAIN
默认就会带上它,所以执行EXPLAIN
和EXPLAIN PLAN
的效果是完全一样的。PIPELINE
:显示查询的执行管道。ESTIMATE
:显示处理查询时要从表中读取的估计行数、标记数和part数,必须基于MergeTree表引擎才能使用它。TABLE OVERRIDE
:显示通过函数访问的表结构上的重写结果,它还进行一些验证,例如,如果重写会导致某种失败则抛出异常。
1. AST
当将EXPLAIN
或EXPLAIN PLAN
换成EXPLAIN AST
后,结果就不一样了。
:) EXPLAIN AST SELECT
id, country, provinces,
flat_province, ranks, flat_rank,
arrayMap(x -> x * x, ranks) AS mapped,
flat_mapped
FROM array_join_table
LEFT ARRAY JOIN
provinces AS flat_province,
ranks AS flat_rank,
mapped AS flat_mapped;
┌─explain───────────────────────────────────────────┐
│ SelectWithUnionQuery (children 1) │
│ ExpressionList (children 1) │
│ SelectQuery (children 2) │
│ ExpressionList (children 8) │
│ Identifier id │
│ Identifier country │
│ Identifier provinces │
│ Identifier flat_province │
│ Identifier ranks │
│ Identifier flat_rank │
│ Function arrayMap (alias mapped) (children 1) │
│ ExpressionList (children 2) │
│ Function lambda (children 1) │
│ ExpressionList (children 2) │
│ Function tuple (children 1) │
│ ExpressionList (children 1) │
│ Identifier x │
│ Function multiply (children 1) │
│ ExpressionList (children 2) │
│ Identifier x │
│ Identifier x │
│ Identifier ranks │
│ Identifier flat_mapped │
│ TablesInSelectQuery (children 2) │
│ TablesInSelectQueryElement (children 1) │
│ TableExpression (children 1) │
│ TableIdentifier array_join_table │
│ TablesInSelectQueryElement (children 1) │
│ ArrayJoin (children 1) │
│ ExpressionList (children 3) │
│ Identifier provinces (alias flat_province) │
│ Identifier ranks (alias flat_rank) │
│ Identifier mapped (alias flat_mapped) │
└───────────────────────────────────────────────────┘
上面的结果显示了语法树,比较复杂,现在还看不懂😢
2. AST
把AST
换成SYNTAX
后的结果。
:) EXPLAIN SYNTAX SELECT
id, country, provinces,
flat_province, ranks, flat_rank,
arrayMap(x -> x * x, ranks) AS mapped,
flat_mapped
FROM array_join_table
LEFT ARRAY JOIN
provinces AS flat_province,
ranks AS flat_rank,
mapped AS flat_mapped;
┌─explain──────────────────────────────────────────┐
│ SELECT │
│ id, │
│ country, │
│ provinces, │
│ flat_province, │
│ ranks, │
│ flat_rank, │
│ arrayMap(x -> (x * x), ranks) AS mapped, │
│ flat_mapped │
│ FROM array_join_table │
│ LEFT ARRAY JOIN │
│ provinces AS flat_province, │
│ ranks AS flat_rank, │
│ arrayMap(x -> (x * x), ranks) AS flat_mapped │
└──────────────────────────────────────────────────┘
除了格式上漂亮了一些,把最后一行的mapped AS flat_mapped
给替换成了arrayMap(x -> (x * x), ranks) AS flat_mapped
。
所以如果想写出优美且标准的Clickhouse SQL
查询,在查询前带上EXPLAIN SYNTAX
是个好习惯。
3. QUERY TREE
它有三个选项。
run_passes
:在转储查询树之前运行所有查询树的passes
,默认值为1
(打开)。dump_passes
:在转储查询树之前,先转储有关已使用的passes
的信息,默认值为0
(关闭)。passes
:指定要运行的passes
次数,如果设置为-1,则运行所有过程,默认值为-1
。
可以观察不同参数下不同结果的区别。
:) EXPLAIN QUERY TREE run_passes = 1, dump_passes = 1, passes = 1 SELECT
id, country, provinces,
flat_province, ranks, flat_rank,
arrayMap(x -> x * x, ranks) AS mapped,
flat_mapped
FROM array_join_table
LEFT ARRAY JOIN
provinces AS flat_province,
ranks AS flat_rank,
mapped AS flat_mapped;
┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ Pass 1 QueryAnalysis - Resolve type for each query expression. Replace identifiers, matchers with query expressions. Perform constant folding. Evaluate scalar subqueries. │
2. │ QUERY id: 0 │
3. │ PROJECTION COLUMNS │
4. │ id UInt8 │
5. │ country String │
6. │ provinces Array(String) │
7. │ flat_province String │
8. │ ranks Array(UInt32) │
9. │ flat_rank UInt32 │
10. │ mapped Array(UInt64) │
11. │ flat_mapped UInt64 │
12. │ PROJECTION │
13. │ LIST id: 1, nodes: 8 │
14. │ COLUMN id: 2, column_name: id, result_type: UInt8, source_id: 3 │
15. │ COLUMN id: 4, column_name: country, result_type: String, source_id: 3 │
16. │ COLUMN id: 5, column_name: provinces, result_type: Array(String), source_id: 3 │
17. │ COLUMN id: 6, column_name: flat_province, result_type: String, source_id: 7 │
18. │ COLUMN id: 8, column_name: ranks, result_type: Array(UInt32), source_id: 3 │
19. │ COLUMN id: 9, column_name: flat_rank, result_type: UInt32, source_id: 7 │
20. │ FUNCTION id: 10, function_name: arrayMap, function_type: ordinary, result_type: Array(UInt64) │
21. │ ARGUMENTS │
22. │ LIST id: 11, nodes: 2 │
23. │ LAMBDA id: 12 │
24. │ ARGUMENTS │
25. │ LIST id: 13, nodes: 1 │
26. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
27. │ EXPRESSION │
28. │ FUNCTION id: 15, function_name: multiply, function_type: ordinary, result_type: UInt64 │
29. │ ARGUMENTS │
30. │ LIST id: 16, nodes: 2 │
31. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
32. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
33. │ COLUMN id: 17, column_name: ranks, result_type: Array(UInt32), source_id: 3 │
34. │ COLUMN id: 18, column_name: flat_mapped, result_type: UInt64, source_id: 7 │
35. │ JOIN TREE │
36. │ ARRAY_JOIN id: 7, is_left: 1 │
37. │ TABLE EXPRESSION │
38. │ TABLE id: 3, alias: __table2, table_name: default.array_join_table │
39. │ JOIN EXPRESSIONS │
40. │ LIST id: 19, nodes: 3 │
41. │ COLUMN id: 20, alias: flat_province, column_name: flat_province, result_type: String, source_id: 7 │
42. │ EXPRESSION │
43. │ COLUMN id: 21, column_name: provinces, result_type: Array(String), source_id: 3 │
44. │ COLUMN id: 22, alias: flat_rank, column_name: flat_rank, result_type: UInt32, source_id: 7 │
45. │ EXPRESSION │
46. │ COLUMN id: 23, column_name: ranks, result_type: Array(UInt32), source_id: 3 │
47. │ COLUMN id: 24, alias: flat_mapped, column_name: flat_mapped, result_type: UInt64, source_id: 7 │
48. │ EXPRESSION │
49. │ FUNCTION id: 10, function_name: arrayMap, function_type: ordinary, result_type: Array(UInt64) │
50. │ ARGUMENTS │
51. │ LIST id: 11, nodes: 2 │
52. │ LAMBDA id: 12 │
53. │ ARGUMENTS │
54. │ LIST id: 13, nodes: 1 │
55. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
56. │ EXPRESSION │
57. │ FUNCTION id: 15, function_name: multiply, function_type: ordinary, result_type: UInt64 │
58. │ ARGUMENTS │
59. │ LIST id: 16, nodes: 2 │
60. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
61. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
62. │ COLUMN id: 17, column_name: ranks, result_type: Array(UInt32), source_id: 3 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
:) EXPLAIN QUERY TREE run_passes = 1, dump_passes = 1, passes = 10 SELECT
id, country, provinces,
flat_province, ranks, flat_rank,
arrayMap(x -> x * x, ranks) AS mapped,
flat_mapped
FROM array_join_table
LEFT ARRAY JOIN
provinces AS flat_province,
ranks AS flat_rank,
mapped AS flat_mapped;
┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ Pass 1 QueryAnalysis - Resolve type for each query expression. Replace identifiers, matchers with query expressions. Perform constant folding. Evaluate scalar subqueries. │
2. │ Pass 2 GroupingFunctionsResolvePass - Resolve GROUPING functions based on GROUP BY modifiers │
3. │ Pass 3 AutoFinalOnQueryPass - Automatically applies final modifier to table expressions in queries if it is supported and if user level final setting is set │
4. │ Pass 4 RemoveUnusedProjectionColumnsPass - Remove unused projection columns in subqueries. │
5. │ Pass 5 FunctionToSubcolumns - Rewrite function to subcolumns, for example tupleElement(column, subcolumn) into column.subcolumn │
6. │ Pass 6 ConvertLogicalExpressionToCNFPass - Convert logical expression to CNF and apply optimizations using constraints │
7. │ Pass 7 RewriteSumFunctionWithSumAndCountPass - Rewrite sum(column +/- literal) into sum(column) and literal * count(column) │
8. │ Pass 8 CountDistinct - Optimize single countDistinct into count over subquery │
9. │ Pass 9 UniqToCount - Rewrite uniq and its variants(except uniqUpTo) to count if subquery has distinct or group by clause. │
10. │ Pass 10 RewriteArrayExistsToHas - Rewrite arrayExists(func, arr) functions to has(arr, elem) when logically equivalent │
11. │ QUERY id: 0 │
12. │ PROJECTION COLUMNS │
13. │ id UInt8 │
14. │ country String │
15. │ provinces Array(String) │
16. │ flat_province String │
17. │ ranks Array(UInt32) │
18. │ flat_rank UInt32 │
19. │ mapped Array(UInt64) │
20. │ flat_mapped UInt64 │
21. │ PROJECTION │
22. │ LIST id: 1, nodes: 8 │
23. │ COLUMN id: 2, column_name: id, result_type: UInt8, source_id: 3 │
24. │ COLUMN id: 4, column_name: country, result_type: String, source_id: 3 │
25. │ COLUMN id: 5, column_name: provinces, result_type: Array(String), source_id: 3 │
26. │ COLUMN id: 6, column_name: flat_province, result_type: String, source_id: 7 │
27. │ COLUMN id: 8, column_name: ranks, result_type: Array(UInt32), source_id: 3 │
28. │ COLUMN id: 9, column_name: flat_rank, result_type: UInt32, source_id: 7 │
29. │ FUNCTION id: 10, function_name: arrayMap, function_type: ordinary, result_type: Array(UInt64) │
30. │ ARGUMENTS │
31. │ LIST id: 11, nodes: 2 │
32. │ LAMBDA id: 12 │
33. │ ARGUMENTS │
34. │ LIST id: 13, nodes: 1 │
35. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
36. │ EXPRESSION │
37. │ FUNCTION id: 15, function_name: multiply, function_type: ordinary, result_type: UInt64 │
38. │ ARGUMENTS │
39. │ LIST id: 16, nodes: 2 │
40. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
41. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
42. │ COLUMN id: 17, column_name: ranks, result_type: Array(UInt32), source_id: 3 │
43. │ COLUMN id: 18, column_name: flat_mapped, result_type: UInt64, source_id: 7 │
44. │ JOIN TREE │
45. │ ARRAY_JOIN id: 7, is_left: 1 │
46. │ TABLE EXPRESSION │
47. │ TABLE id: 3, alias: __table2, table_name: default.array_join_table │
48. │ JOIN EXPRESSIONS │
49. │ LIST id: 19, nodes: 3 │
50. │ COLUMN id: 20, alias: flat_province, column_name: flat_province, result_type: String, source_id: 7 │
51. │ EXPRESSION │
52. │ COLUMN id: 21, column_name: provinces, result_type: Array(String), source_id: 3 │
53. │ COLUMN id: 22, alias: flat_rank, column_name: flat_rank, result_type: UInt32, source_id: 7 │
54. │ EXPRESSION │
55. │ COLUMN id: 23, column_name: ranks, result_type: Array(UInt32), source_id: 3 │
56. │ COLUMN id: 24, alias: flat_mapped, column_name: flat_mapped, result_type: UInt64, source_id: 7 │
57. │ EXPRESSION │
58. │ FUNCTION id: 10, function_name: arrayMap, function_type: ordinary, result_type: Array(UInt64) │
59. │ ARGUMENTS │
60. │ LIST id: 11, nodes: 2 │
61. │ LAMBDA id: 12 │
62. │ ARGUMENTS │
63. │ LIST id: 13, nodes: 1 │
64. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
65. │ EXPRESSION │
66. │ FUNCTION id: 15, function_name: multiply, function_type: ordinary, result_type: UInt64 │
67. │ ARGUMENTS │
68. │ LIST id: 16, nodes: 2 │
69. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
70. │ COLUMN id: 14, column_name: x, result_type: UInt32, source_id: 12 │
71. │ COLUMN id: 17, column_name: ranks, result_type: Array(UInt32), source_id: 3 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
上面执行了两次SQL查询,从第1行
到第10行
就能看到参数passes
设置不同值的不同结果,其他部分都一样。
可以继续尝试调整run_passes
、dump_passes
和passes
这三个参数不同组合看看效果,此处略过。
4. PLAN
虽然是EXPLAIN
默认自带的参数,但它其实也有五个不同的选项。
- header:打印计划中各个步骤的 header 说明,默认值为
0
(关闭)。
当header = 1
,Header
键会作为列数组添加到步骤中。
[
{
"Plan": {
"Node Type": "Expression",
"Header": [
{
"Name": "1",
"Type": "UInt8"
},
{
"Name": "plus(2, dummy)",
"Type": "UInt16"
}
],
"Plans": [
{
"Node Type": "SettingQuotaAndLimits",
"Header": [
{
"Name": "dummy",
"Type": "UInt8"
}
],
"Plans": [
{
"Node Type": "ReadFromStorage",
"Header": [
{
"Name": "dummy",
"Type": "UInt8"
}
]
}
]
}
]
}
}
]
- description:打印计划中各个步骤的描述,默认值为
1
(开启)。
当description = 1
,Description
关键字会加入到内容中。
{
"Node Type": "ReadFromStorage",
"Description": "SystemOne"
}
- indexes:显示所使用的索引、过滤部分的数量以及每个应用索引的过滤颗粒的数量。默认值为
0
(关闭),仅MergeTree表引擎支持。
当indexes = 1
,Indexes
关键字会加入到内容中。
"Node Type": "ReadFromMergeTree",
"Indexes": [
{
"Type": "MinMax",
"Keys": ["y"],
"Condition": "(y in [1, +inf))",
"Parts": 5/4,
"Granules": 12/11
},
......
]
- actions:打印计划中各个步骤的详细信息,默认值为
0
(关闭)。
当actions = 1
,Actions
关键字会加入到内容中。
......
"Actions": [
{
"Node Type": "Column",
"Result Type": "UInt8",
"Result Type": "Column",
"Column": "Const(UInt8)",
"Arguments": [],
"Removed Arguments": [],
"Result": 0
}
......
- json:以JSON格式将查询计划步骤打印为一行,默认值为
0
。建议使用TSVRaw格式,以避免不必要的转义。
当json = 1
,输出结果会像这样组织内容。
[
{
"Plan": {
"Node Type": "Union",
"Plans": [
{
"Node Type": "Expression",
"Plans": [
{
"Node Type": "SettingQuotaAndLimits",
"Plans": [
{
"Node Type": "ReadFromStorage"
}
]
}
]
},
{
"Node Type": "Expression",
"Plans": [
{
"Node Type": "SettingQuotaAndLimits",
"Plans": [
{
"Node Type": "ReadFromStorage"
}
]
}
]
}
]
}
}
]
下面是具体的查询语句输出结果。
:) EXPLAIN PLAN header = 1, description = 1, indexes = 1, actions = 1 , json = 1 SELECT
id, country, provinces,
flat_province, ranks, flat_rank,
arrayMap(x -> x * x, ranks) AS mapped,
flat_mapped
FROM array_join_table
LEFT ARRAY JOIN
provinces AS flat_province,
ranks AS flat_rank,
mapped AS flat_mapped;
┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ [
{
"Plan": {
"Node Type": "Expression",
"Description": "(Project names + Projection [split])",
"Header": [
{
"Name": "id",
"Type": "UInt8"
},
{
"Name": "country",
"Type": "String"
},
{
"Name": "provinces",
"Type": "Array(String)"
},
{
"Name": "flat_province",
"Type": "String"
},
{
"Name": "ranks",
"Type": "Array(UInt32)"
},
{
"Name": "flat_rank",
"Type": "UInt32"
},
{
"Name": "mapped",
"Type": "Array(UInt64)"
},
{
"Name": "flat_mapped",
"Type": "UInt64"
}
],
"Expression": {
"Inputs": [
{
"Name": "arrayMap(x UInt32 -> multiply(x, x), __table2.ranks)",
"Type": "Array(UInt64)"
},
{
"Name": "__table2.id",
"Type": "UInt8"
},
{
"Name": "__table2.country",
"Type": "String"
},
{
"Name": "__table2.provinces",
"Type": "Array(String)"
},
{
"Name": "__table2.ranks",
"Type": "Array(UInt32)"
},
{
"Name": "__table1.flat_province",
"Type": "String"
},
{
"Name": "__table1.flat_rank",
"Type": "UInt32"
},
{
"Name": "__table1.flat_mapped",
"Type": "UInt64"
}
],
"Actions": [
{
"Node Type": "INPUT",
"Result Type": "UInt8",
"Result Name": "__table2.id",
"Arguments": [1],
"Removed Arguments": [],
"Result": 0
},
{
"Node Type": "INPUT",
"Result Type": "String",
"Result Name": "__table2.country",
"Arguments": [2],
"Removed Arguments": [],
"Result": 1
},
{
"Node Type": "INPUT",
"Result Type": "Array(String)",
"Result Name": "__table2.provinces",
"Arguments": [3],
"Removed Arguments": [],
"Result": 2
},
{
"Node Type": "INPUT",
"Result Type": "Array(UInt32)",
"Result Name": "__table2.ranks",
"Arguments": [4],
"Removed Arguments": [],
"Result": 3
},
{
"Node Type": "INPUT",
"Result Type": "String",
"Result Name": "__table1.flat_province",
"Arguments": [5],
"Removed Arguments": [],
"Result": 4
},
{
"Node Type": "INPUT",
"Result Type": "UInt32",
"Result Name": "__table1.flat_rank",
"Arguments": [6],
"Removed Arguments": [],
"Result": 5
},
{
"Node Type": "INPUT",
"Result Type": "UInt64",
"Result Name": "__table1.flat_mapped",
"Arguments": [7],
"Removed Arguments": [],
"Result": 6
},
{
"Node Type": "INPUT",
"Result Type": "Array(UInt64)",
"Result Name": "arrayMap(x UInt32 -> multiply(x, x), __table2.ranks)",
"Arguments": [0],
"Removed Arguments": [],
"Result": 7
},
{
"Node Type": "ALIAS",
"Result Type": "UInt8",
"Result Name": "id",
"Arguments": [0],
"Removed Arguments": [0],
"Result": 8
},
{
"Node Type": "ALIAS",
"Result Type": "String",
"Result Name": "country",
"Arguments": [1],
"Removed Arguments": [1],
"Result": 0
},
{
"Node Type": "ALIAS",
"Result Type": "Array(String)",
"Result Name": "provinces",
"Arguments": [2],
"Removed Arguments": [2],
"Result": 1
},
{
"Node Type": "ALIAS",
"Result Type": "Array(UInt32)",
"Result Name": "ranks",
"Arguments": [3],
"Removed Arguments": [3],
"Result": 2
},
{
"Node Type": "ALIAS",
"Result Type": "String",
"Result Name": "flat_province",
"Arguments": [4],
"Removed Arguments": [4],
"Result": 3
},
{
"Node Type": "ALIAS",
"Result Type": "UInt32",
"Result Name": "flat_rank",
"Arguments": [5],
"Removed Arguments": [5],
"Result": 4
},
{
"Node Type": "ALIAS",
"Result Type": "UInt64",
"Result Name": "flat_mapped",
"Arguments": [6],
"Removed Arguments": [6],
"Result": 5
},
{
"Node Type": "ALIAS",
"Result Type": "Array(UInt64)",
"Result Name": "mapped",
"Arguments": [7],
"Removed Arguments": [7],
"Result": 6
}
],
"Outputs": [
{
"Name": "id",
"Type": "UInt8"
},
{
"Name": "country",
"Type": "String"
},
{
"Name": "provinces",
"Type": "Array(String)"
},
{
"Name": "flat_province",
"Type": "String"
},
{
"Name": "ranks",
"Type": "Array(UInt32)"
},
{
"Name": "flat_rank",
"Type": "UInt32"
},
{
"Name": "mapped",
"Type": "Array(UInt64)"
},
{
"Name": "flat_mapped",
"Type": "UInt64"
}
],
"Positions": [8, 0, 1, 3, 2, 4, 6, 5],
"Project Input": true
},
"Plans": [
{
"Node Type": "ArrayJoin",
"Description": "ARRAY JOIN",
"Header": [
{
"Name": "__table2.id",
"Type": "UInt8"
},
{
"Name": "__table2.country",
"Type": "String"
},
{
"Name": "__table2.provinces",
"Type": "Array(String)"
},
{
"Name": "__table2.ranks",
"Type": "Array(UInt32)"
},
{
"Name": "arrayMap(x UInt32 -> multiply(x, x), __table2.ranks)",
"Type": "Array(UInt64)"
},
{
"Name": "__table1.flat_province",
"Type": "String"
},
{
"Name": "__table1.flat_rank",
"Type": "UInt32"
},
{
"Name": "__table1.flat_mapped",
"Type": "UInt64"
}
],
"Left": true,
"Columns": ["__table1.flat_province", "__table1.flat_rank", "__table1.flat_mapped"],
"Plans": [
{
"Node Type": "Expression",
"Description": "(Projection + (DROP unused columns before ARRAY JOIN + (ARRAY JOIN actions + Change column names to column identifiers)))",
"Header": [
{
"Name": "__table2.id",
"Type": "UInt8"
},
{
"Name": "__table2.country",
"Type": "String"
},
{
"Name": "__table2.provinces",
"Type": "Array(String)"
},
{
"Name": "__table2.ranks",
"Type": "Array(UInt32)"
},
{
"Name": "arrayMap(x UInt32 -> multiply(x, x), __table2.ranks)",
"Type": "Array(UInt64)"
},
{
"Name": "__table1.flat_province",
"Type": "Array(String)"
},
{
"Name": "__table1.flat_rank",
"Type": "Array(UInt32)"
},
{
"Name": "__table1.flat_mapped",
"Type": "Array(UInt64)"
}
],
"Expression": {
"Inputs": [
{
"Name": "id",
"Type": "UInt8"
},
{
"Name": "country",
"Type": "String"
},
{
"Name": "provinces",
"Type": "Array(String)"
},
{
"Name": "ranks",
"Type": "Array(UInt32)"
}
],
"Actions": [
{
"Node Type": "INPUT",
"Result Type": "UInt8",
"Result Name": "id",
"Arguments": [0],
"Removed Arguments": [],
"Result": 0
},
{
"Node Type": "INPUT",
"Result Type": "String",
"Result Name": "country",
"Arguments": [1],
"Removed Arguments": [],
"Result": 1
},
{
"Node Type": "INPUT",
"Result Type": "Array(String)",
"Result Name": "provinces",
"Arguments": [2],
"Removed Arguments": [],
"Result": 2
},
{
"Node Type": "INPUT",
"Result Type": "Array(UInt32)",
"Result Name": "ranks",
"Arguments": [3],
"Removed Arguments": [],
"Result": 3
},
{
"Node Type": "FUNCTION",
"Result Type": "Function(UInt32 -> UInt64)",
"Result Name": "x UInt32 -> multiply(x, x)",
"Function": "Capture[](UInt32) -> UInt64",
"Compiled": false,
"Arguments": [],
"Removed Arguments": [],
"Result": 4
},
{
"Node Type": "FUNCTION",
"Result Type": "Function(UInt32 -> UInt64)",
"Result Name": "x UInt32 -> multiply(x, x)",
"Function": "Capture[](UInt32) -> UInt64",
"Compiled": false,
"Arguments": [],
"Removed Arguments": [],
"Result": 5
},
{
"Node Type": "ALIAS",
"Result Type": "UInt8",
"Result Name": "__table2.id",
"Arguments": [0],
"Removed Arguments": [0],
"Result": 6
},
{
"Node Type": "ALIAS",
"Result Type": "String",
"Result Name": "__table2.country",
"Arguments": [1],
"Removed Arguments": [1],
"Result": 0
},
{
"Node Type": "ALIAS",
"Result Type": "Array(String)",
"Result Name": "__table2.provinces",
"Arguments": [2],
"Removed Arguments": [2],
"Result": 1
},
{
"Node Type": "ALIAS",
"Result Type": "Array(UInt32)",
"Result Name": "__table2.ranks",
"Arguments": [3],
"Removed Arguments": [3],
"Result": 2
},
{
"Node Type": "ALIAS",
"Result Type": "Array(String)",
"Result Name": "__table1.flat_province",
"Arguments": [1],
"Removed Arguments": [],
"Result": 3
},
{
"Node Type": "ALIAS",
"Result Type": "Array(UInt32)",
"Result Name": "__table1.flat_rank",
"Arguments": [2],
"Removed Arguments": [],
"Result": 7
},
{
"Node Type": "FUNCTION",
"Result Type": "Array(UInt64)",
"Result Name": "arrayMap(x UInt32 -> multiply(x, x), __table2.ranks)",
"Function": "arrayMap",
"Compiled": false,
"Arguments": [4, 2],
"Removed Arguments": [4],
"Result": 8
},
{
"Node Type": "FUNCTION",
"Result Type": "Array(UInt64)",
"Result Name": "arrayMap(x UInt32 -> multiply(x, x), __table2.ranks)",
"Function": "arrayMap",
"Compiled": false,
"Arguments": [5, 2],
"Removed Arguments": [5],
"Result": 4
},
{
"Node Type": "ALIAS",
"Result Type": "Array(UInt64)",
"Result Name": "__table1.flat_mapped",
"Arguments": [8],
"Removed Arguments": [8],
"Result": 5
}
],
"Outputs": [
{
"Name": "__table2.id",
"Type": "UInt8"
},
{
"Name": "__table2.country",
"Type": "String"
},
{
"Name": "__table2.provinces",
"Type": "Array(String)"
},
{
"Name": "__table2.ranks",
"Type": "Array(UInt32)"
},
{
"Name": "arrayMap(x UInt32 -> multiply(x, x), __table2.ranks)",
"Type": "Array(UInt64)"
},
{
"Name": "__table1.flat_province",
"Type": "Array(String)"
},
{
"Name": "__table1.flat_rank",
"Type": "Array(UInt32)"
},
{
"Name": "__table1.flat_mapped",
"Type": "Array(UInt64)"
}
],
"Positions": [6, 0, 1, 2, 4, 3, 7, 5],
"Project Input": true
},
"Plans": [
{
"Node Type": "ReadFromMergeTree",
"Description": "default.array_join_table",
"Header": [
{
"Name": "id",
"Type": "UInt8"
},
{
"Name": "country",
"Type": "String"
},
{
"Name": "provinces",
"Type": "Array(String)"
},
{
"Name": "ranks",
"Type": "Array(UInt32)"
}
],
"Read Type": "Default",
"Parts": 1,
"Granules": 1,
"Indexes": [
{
"Type": "PrimaryKey",
"Condition": "true",
"Initial Parts": 1,
"Selected Parts": 1,
"Initial Granules": 1,
"Selected Granules": 1
}
]
}
]
}
]
}
]
}
}
] │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
这里把全部的中间结果的详细信息都打印出来了,而且从行号就能看到,只用了一行的JSON格式来显示。
5. PIPELINE
PIPELINE
有三个不同的选项。
header
:打印每个输出端口的header
,默认值为0
(关闭)。graph
:打印用DOT图形描述语言描述的图形,默认值为0
(关闭)。compact
:如果启用了图形设置,则以紧凑模式打印图形,默认值为1
(打开)。
下面是具体的查询语句输出结果。
:) EXPLAIN PIPELINE header = 1, graph = 1, compact = 1 SELECT
id, country, provinces,
flat_province, ranks, flat_rank,
arrayMap(x -> x * x, ranks) AS mapped,
flat_mapped
FROM array_join_table
LEFT ARRAY JOIN
provinces AS flat_province,
ranks AS flat_rank,
mapped AS flat_mapped;
┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ digraph │
│ { │
│ rankdir="LR"; │
│ { node [shape = rect] │
│ subgraph cluster_0 { │
│ label ="Expression"; │
│ style=filled; │
│ color=lightgrey; │
│ node [style=filled,color=white]; │
│ { rank = same; │
│ n4 [label="ExpressionTransform"]; │
│ } │
│ } │
│ subgraph cluster_1 { │
│ label ="ReadFromMergeTree"; │
│ style=filled; │
│ color=lightgrey; │
│ node [style=filled,color=white]; │
│ { rank = same; │
│ n1 [label="MergeTreeSelect(pool: ReadPoolInOrder, algorithm: InOrder)"]; │
│ } │
│ } │
│ subgraph cluster_2 { │
│ label ="Expression"; │
│ style=filled; │
│ color=lightgrey; │
│ node [style=filled,color=white]; │
│ { rank = same; │
│ n2 [label="ExpressionTransform"]; │
│ } │
│ } │
│ subgraph cluster_3 { │
│ label ="ArrayJoin"; │
│ style=filled; │
│ color=lightgrey; │
│ node [style=filled,color=white]; │
│ { rank = same; │
│ n3 [label="ArrayJoinTransform"]; │
│ } │
│ } │
│ } │
│ n1 -> n2 [label=" │
│ id UInt8 UInt8(size = 0) │
│ country String String(size = 0) │
│ provinces Array(String) Array(size = 0, UInt64(size = 0), String(size = 0)) │
│ ranks Array(UInt32) Array(size = 0, UInt64(size = 0), UInt32(size = 0))"]; │
│ n2 -> n3 [label=" │
│ __table2.id UInt8 UInt8(size = 0) │
│ __table2.country String String(size = 0) │
│ __table2.provinces Array(String) Array(size = 0, UInt64(size = 0), String(size = 0)) │
│ __table2.ranks Array(UInt32) Array(size = 0, UInt64(size = 0), UInt32(size = 0)) │
│ arrayMap(x UInt32 -> multiply(x, x), __table2.ranks) Array(UInt64) Array(size = 0, UInt64(size = 0), UInt64(size = 0)) │
│ __table1.flat_province Array(String) Array(size = 0, UInt64(size = 0), String(size = 0)) │
│ __table1.flat_rank Array(UInt32) Array(size = 0, UInt64(size = 0), UInt32(size = 0)) │
│ __table1.flat_mapped Array(UInt64) Array(size = 0, UInt64(size = 0), UInt64(size = 0))"]; │
│ n3 -> n4 [label=" │
│ __table2.id UInt8 UInt8(size = 0) │
│ __table2.country String String(size = 0) │
│ __table2.provinces Array(String) Array(size = 0, UInt64(size = 0), String(size = 0)) │
│ __table2.ranks Array(UInt32) Array(size = 0, UInt64(size = 0), UInt32(size = 0)) │
│ arrayMap(x UInt32 -> multiply(x, x), __table2.ranks) Array(UInt64) Array(size = 0, UInt64(size = 0), UInt64(size = 0)) │
│ __table1.flat_province String String(size = 0) │
│ __table1.flat_rank UInt32 UInt32(size = 0) │
│ __table1.flat_mapped UInt64 UInt64(size = 0)"]; │
│ } │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
6. ESTIMATE
ESTIMATE
显示出的信息最为简单。
:) EXPLAIN ESTIMATE SELECT
id, country, provinces,
flat_province, ranks, flat_rank,
arrayMap(x -> x * x, ranks) AS mapped,
flat_mapped
FROM array_join_table
LEFT ARRAY JOIN
provinces AS flat_province,
ranks AS flat_rank,
mapped AS flat_mapped;
┌─database─┬─table────────────┬─parts─┬─rows─┬─marks─┐
│ default │ array_join_table │ 1 │ 5 │ 1 │
└──────────┴──────────────────┴───────┴──────┴───────┘
7. TABLE OVERRIDE
官方对它的介绍不多,应该主要是用于和其他数据库引擎协同上的,例如MySQL。
感谢支持
更多内容,请移步《超级个体》。