CORE27

View Original

PostgreSQL - Cancelling statement due to conflict with recovery

When running a CQRS based system or long running analytics workloads a replica is the fastest route. However, the default PostgreSQL configuration leads to a few issues.

Let’s start with the most common requirements. Generating a complex retention report using the replica. Now, if our system is fairly busy persisting hundreds of events every second, the data is consistently being replicated onto our replica instances. Running a query like this will take several seconds, if not minutes.

If the data underneath has changed it can lead to a replication conflict.

Does the data really change during a SELECT clause on the replica?

This depends on how we configure our replication to behave. On a read replica configured to stay in sync using WAL replication, the transactions on the primary are shipped to the standbys via the write ahead logs. In order to stay in sync with the primary the replica applies the logs in the same order causing the data underneath the query to change. This results in cancelling the query conflicting query being run on the replica.

However, the replica has the option to store the incoming WAL entries while allowing the ongoing SELECT query to progress. It delays the processing of the WALs and prioritizes the completion of the ongoing query.


Delaying operations on the replica

To ensure that the replica can process its queries without a replication conflict we configure two attributes in the postgresql.conf

  1. max_standby_archive_delay

  2. max_standby_streaming_delay

The `max_standby_streaming_delay` tells the replica how long to wait before cancelling an ongoing query that conflicts with the WAL entry. This defaults to 30 seconds. Setting it to -1 tells the replica to wait indefinitely till the current query is processed. However, this may not be idea if the standby is expected to be in sync with the primary as you’ll notice the replication lag increase.


The `max_standby_archive_delay` tells the replica if it should allow a really old, archived WAL entry to be applied on the standby. This defaults to 30 seconds. When allocated a duration, this makes it harder for subsequently delayed logs to be applied. Setting it to -1 tells the replica to archive indefinitely.


VACUUM and the hot_standby_feedback attribute

Vacuum is used to reclaim space from dead tuples. To view this in the context of our replica and the replication conflict error let's consider the following timeline.


In this example when a DELETE transaction completes, it’s not the COMMIT that tells PostgreSQL to delete the tuples. It’s the VACUUM operation that actually clears the deleted rows. This would intuitively cause a replication conflict when the standby is in the midst of its own query as the data underneath would’ve disappeared.

The hot_standby_feedback tells the primary about running transactions on the standby and waits for the standby to complete its operations before invoking a VACUUM call to delete the rows. This could, however, lead to bloating the primary database if there are way too many long running queries on the standby, that prevent the primary from running VACUUM periodically.