Question: Why is Postgres replication not working?

Answer

Several reasons can lead to PostgreSQL replication issues. Here's a comprehensive breakdown of common problems and their solutions:

  1. Connection Issues: Ensure that the replicating server can connect to the master. Check firewall settings, pg_hba.conf entries, and listen_addresses in postgresql.conf.

    -- Example pg_hba.conf entry for replication host replication all slave_ip/32 md5
  2. Configuration Errors: Both postgresql.conf and pg_hba.conf must be correctly configured on the master and replica(s). Important parameters include wal_level, max_wal_senders, archive_mode, and hot_standby.

    # postgresql.conf sample settings for the master wal_level = replica max_wal_senders = 5 archive_mode = on
  3. WAL Files Not Being Sent or Received Properly: This could be due to insufficient disk space, network issues, or incorrect archive_command settings. Use the pg_stat_replication view on the master to check the status of connected replicas.

    SELECT * FROM pg_stat_replication;
  4. Incorrect Recovery Configuration: On the standby server, ensure standby_mode is set to 'on' in the recovery.conf (for versions before PostgreSQL 12) or in the postgresql.conf file (from PostgreSQL 12 onwards).

  5. Version Mismatches: Replication between different major versions of PostgreSQL is not supported. Ensure both the master and replica(s) are running compatible versions.

  6. Corrupted Data: Corruption in the master's data can prevent replication. Running a consistency check with pg_checksums can help identify if this is the issue.

  7. Monitoring and Logs: Always check the PostgreSQL log files on both master and standby servers for errors. Often, the logs will provide specific error messages that can guide troubleshooting.

  8. Slot Issues: Ensure that replication slots (if used) are correctly configured. A missing or misconfigured replication slot can prevent replication from starting.

    SELECT * FROM pg_replication_slots;
  9. Network Latency or Instability: High latency or an unstable network can cause replication lag or disconnections. Monitoring tools can help identify these issues.

In most cases, addressing the above areas will resolve replication issues. However, each PostgreSQL setup can be unique, so consider other configurations or customizations that might impact replication.

Was this content helpful?

White Paper

Free System Design on AWS E-Book

Download this early release of O'Reilly's latest cloud infrastructure e-book: System Design on AWS.

Free System Design on AWS E-Book
Start building today

Dragonfly is fully compatible with the Redis ecosystem and requires no code changes to implement.