[Answered] How do you use the PARTITION BY clause in PostgreSQL?

Answer

Partitioning in PostgreSQL is a technique used to divide large tables into smaller, more manageable pieces, called partitions. The PARTITION BY clause is an essential aspect of this process, especially when it comes to query optimization and data organization. Here's how you can use the PARTITION BY clause in PostgreSQL:

Creating Partitioned Tables

To create a partitioned table, you must define a parent table with no actual data, and then create one or more child tables that inherit from this parent table. Each child table is defined with a check constraint that determines which records are stored in it. In PostgreSQL 10 and later, declarative partitioning allows you to specify partitions directly in the parent table definition.

Example: Range Partitioning

CREATE TABLE measurement (
    city_id int not null,
    logdate date not null,
    peaktemp int,
    unitsales int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2020 PARTITION OF measurement
FOR VALUES FROM ('2020-01-01') TO ('2021-01-01');

CREATE TABLE measurement_y2021 PARTITION OF measurement
FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');

In this example, the measurement table is partitioned by range on the logdate column. Two partitions are created for the years 2020 and 2021.

Selecting Data from Partitions

When querying a partitioned table, PostgreSQL automatically directs the query to the appropriate partition(s) based on the WHERE clause. This process is called partition pruning and helps improve query performance.

SELECT * FROM measurement WHERE logdate BETWEEN '2020-06-01' AND '2020-06-30';

In this case, only the measurement_y2020 partition will be scanned.

Using PARTITION BY in Window Functions

The PARTITION BY clause is also used in window functions to divide rows into partitions for the function to operate over. This is useful for performing calculations over subsets of the data.

Example: Calculating Running Totals

SELECT city_id,
       logdate,
       peaktemp,
       SUM(peaktemp) OVER (PARTITION BY city_id ORDER BY logdate) AS running_total
FROM measurement;

This query calculates the running total of peaktemp for each city_id, ordered by logdate.

Conclusion

Using PARTITION BY in PostgreSQL, whether for creating partitioned tables or within window functions, can significantly enhance your database's manageability and performance. It allows for more efficient data storage, faster queries due to partition pruning, and more powerful analytical capabilities through the use of window functions.