Question: What is PostgreSQL replication streaming?
Answer
PostgreSQL offers a feature known as streaming replication, allowing real-time replication of database changes from a primary server to one or more secondary servers. This mechanism ensures high availability, load balancing, and read scalability in PostgreSQL deployments.
How Does Streaming Replication Work?
Streaming replication works by sending WAL (Write-Ahead Logging) records from the primary server to standby servers as soon as they are generated. The standby servers then apply these WAL records, ensuring the data on standby servers is as up-to-date as possible.
Setting Up Streaming Replication
-
Configure the Primary Server:
- Modify
postgresql.conf
:wal_level = replica max_wal_senders = 5 wal_keep_segments = 32
- Edit
pg_hba.conf
to allow connections from standby servers:host replication all standby_ip/32 trust
- Modify
-
Take a Base Backup: Use
pg_basebackup
to take a snapshot of the primary server's data directory and copy it to the standby server.pg_basebackup -h primary_ip -D /var/lib/postgresql/data -U replicator -P -v
-
Configure the Standby Server:
- Create a recovery file
standby.signal
in the data directory. - Configure
postgresql.conf
similar to the primary, adjusting settings as needed for the standby role. - Optionally, create a
recovery.conf
in the data directory to specify connection parameters to the primary server:primary_conninfo = 'host=primary_ip port=5432 user=replicator password=your_password'
- Create a recovery file
Failover and Promotion
When the primary server fails, one of the standby servers can be promoted to become the new primary server. This can be done using PostgreSQL's pg_ctl promote
command or by creating a trigger file specified in the standby's configuration.
Read Scaling and Load Balancing
Streaming replication allows read queries to be distributed among multiple standby servers, thus increasing the system's read throughput. However, write operations can only be performed on the primary server.
Considerations
-
Synchronous vs. Asynchronous: Streaming replication can be configured for synchronous or asynchronous replication. Synchronous replication guarantees that transactions are written to at least one standby server before being committed on the primary, ensuring no data loss. Asynchronous replication does not offer this guarantee but provides lower latency.
-
Network Latency: High network latency can impact the performance of synchronous replication setups.
-
Monitoring and Maintenance: Regular monitoring of replication lag and maintenance tasks like WAL archiving and vacuuming are essential for the health of a replicated PostgreSQL setup.
Streaming replication is a powerful feature of PostgreSQL, enabling high availability, disaster recovery, and horizontal read scaling. Properly configuring and managing this setup is crucial for achieving these benefits without significant performance trade-offs.
Was this content helpful?
Other Common PostgreSQL Questions (and Answers)
- How do you manage Postgres replication lag?
- How can I limit the number of rows updated in a PostgreSQL query?
- What is PostgreSQL replication and how does it work?
- How does sharding work in PostgreSQL?
- What is partitioning in PostgreSQL?
- How do you limit the number of rows deleted in PostgreSQL?
- How do you use the PARTITION OVER clause in PostgreSQL?
- How do you use the PARTITION BY clause in PostgreSQL?
- What are PostgreSQL replication slots and how do they work?
- How can you partition an existing table in PostgreSQL?
- How do you set up replication in PostgreSQL?
- What are PostgreSQLs size limitations?
Free System Design on AWS E-Book
Download this early release of O'Reilly's latest cloud infrastructure e-book: System Design on AWS.
Start building today
Dragonfly is fully compatible with the Redis ecosystem and requires no code changes to implement.