PostgreSQL 中的 DISTINCT 和 DISTINCT ON

在 PostgreSQL 中，DISTINCT 和 DISTINCT ON 是用来消除重复记录的重要工具。理解它们的区别和用法对于优化查询和处理数据是至关重要的。本文将详细解析这两种功能，并通过实例展示其应用。

NULL 在 DISTINCT 中的处理

在 PostgreSQL 中，NULL 值在 DISTINCT 中被特殊对待。具体来说，NULL 被视为与其他 NULL 相等的值。例如：

postgres=# create table t3 (id int);postgres=# insert into t3 values (1);postgres=# insert into t3 values (2);postgres=# insert into t3 values (null);postgres=# insert into t3 values (null);postgres=# insert into t3 values (null);postgres=# select distinct id from t3;

执行上述命令会发现，尽管插入了三个 NULL，select distinct id 的结果中只包含两个记录。这是因为 NULL 被视为相同的值，根据 DISTINCT 的规则，所有相同的记录只保留一条。

DISTINCT ON 的用法

DISTINCT ON 的功能比 DISTINCT 更强，它允许你根据指定的表达式对记录进行分组，并只保留每组的第一条记录。它的基本形式如下：

select distinct on (expression) column1, column2, ... from table;

工作原理

表达式的评估：对于每一条记录，expression 被计算。

分组：所有具有相同 expression 值的记录被分组。

保留第一条记录：每个分组中只保留第一条记录。

需要注意的是，如果没有使用 ORDER BY 子句，返回的第一条记录是不可预测的。为了确保结果的确定性，最好在 DISTINCT ON 中使用 ORDER BY 子句，并确保排序列在 expression 的左边。

DISTINCT ON 的实际应用示例

以下是一个典型的应用场景：从一个名单中提取每个人在各个科目上的最高成绩。

postgres=# CREATE TABLE score_ranking (id int, name text, subject text, score numeric);postgres=# INSERT INTO score_ranking VALUES     (1,'killerbee','数学',99.5),     (2,'killerbee','语文',89.5),    (3,'killerbee','英语',79.5),     (4,'killerbee','物理',99.5),     (5,'killerbee','化学',98.5),    (6,'刘德华','数学',89.5),     (7,'刘德华','语文',99.5),     (8,'刘德华','英语',79.5),    (9,'刘德华','物理',89.5),     (10,'刘德华','化学',69.5),    (11,'张学友','数学',89.5),     (12,'张学友','语文',91.5),     (13,'张学友','英语',92.5),     (14,'张学友','物理',93.5),     (15,'张学友','化学',94.5);postgres=# select distinct on (subject) id, name, subject, score from score_ranking order by subject, score desc;

执行上述查询后，你将获得每科的最高分记录：