Question: What is PostgreSQL replication streaming?

Answer

PostgreSQL offers a feature known as streaming replication, allowing real-time replication of database changes from a primary server to one or more secondary servers. This mechanism ensures high availability, load balancing, and read scalability in PostgreSQL deployments.

How Does Streaming Replication Work?

Streaming replication works by sending WAL (Write-Ahead Logging) records from the primary server to standby servers as soon as they are generated. The standby servers then apply these WAL records, ensuring the data on standby servers is as up-to-date as possible.

Setting Up Streaming Replication

  1. Configure the Primary Server:

    • Modify postgresql.conf:
      wal_level = replica max_wal_senders = 5 wal_keep_segments = 32
    • Edit pg_hba.conf to allow connections from standby servers:
      host replication all standby_ip/32 trust
      
  2. Take a Base Backup: Use pg_basebackup to take a snapshot of the primary server's data directory and copy it to the standby server.

    pg_basebackup -h primary_ip -D /var/lib/postgresql/data -U replicator -P -v
  3. Configure the Standby Server:

    • Create a recovery file standby.signal in the data directory.
    • Configure postgresql.conf similar to the primary, adjusting settings as needed for the standby role.
    • Optionally, create a recovery.conf in the data directory to specify connection parameters to the primary server:
      primary_conninfo = 'host=primary_ip port=5432 user=replicator password=your_password'

Failover and Promotion

When the primary server fails, one of the standby servers can be promoted to become the new primary server. This can be done using PostgreSQL's pg_ctl promote command or by creating a trigger file specified in the standby's configuration.

Read Scaling and Load Balancing

Streaming replication allows read queries to be distributed among multiple standby servers, thus increasing the system's read throughput. However, write operations can only be performed on the primary server.

Considerations

  • Synchronous vs. Asynchronous: Streaming replication can be configured for synchronous or asynchronous replication. Synchronous replication guarantees that transactions are written to at least one standby server before being committed on the primary, ensuring no data loss. Asynchronous replication does not offer this guarantee but provides lower latency.

  • Network Latency: High network latency can impact the performance of synchronous replication setups.

  • Monitoring and Maintenance: Regular monitoring of replication lag and maintenance tasks like WAL archiving and vacuuming are essential for the health of a replicated PostgreSQL setup.

Streaming replication is a powerful feature of PostgreSQL, enabling high availability, disaster recovery, and horizontal read scaling. Properly configuring and managing this setup is crucial for achieving these benefits without significant performance trade-offs.

Was this content helpful?

White Paper

Free System Design on AWS E-Book

Download this early release of O'Reilly's latest cloud infrastructure e-book: System Design on AWS.

Free System Design on AWS E-Book
Start building today

Dragonfly is fully compatible with the Redis ecosystem and requires no code changes to implement.