If your PostgreSQL instance is randomly going into recovery mode... Have you checked if your disk is full?
2022-04-26 13:22:24.962 UTC [382] LOG: database system was not properly shut down; automatic recovery in progress
In the past, if my PostgreSQL LXC container disk was full, PostgreSQL would throw I/O errors when trying to write new data to it, so when my instance was randomly going into recovery, I was stumped, what is causing this?
Until by accident I noticed that my disk was dangerously close to full, so I've decided to check if the disk being full could be culprit of why it was going into recovery, and, fortunately, it was!
My theory is that this was a change made in newer PostgreSQL versions. If the disk is full, fsync
fails, which causes PostgreSQL to consider that the data is now unreliable and starts restoring the database state from WAL files. Previous PostgreSQL versions were using fsync incorrectly, then changing how it works so, if fsync fails, PostgreSQL will automatically shut down and initiate recovery mode.
Or maybe it is because I migrated my LXC container from a EXT4 partition to ZFS, who knows!
But if your free disk space is suspiciously low after a recover, and you want to be sure that this is what triggered recovery: Create a large file that fits in your free disk space, wait until PostgreSQL tries doing any I/O, and check if PostgreSQL crashes and starts automatic recovery. If yes, then your disk being full is what could have triggered recovery mode!
However, you may be wondering...
If the issue was that there wasn't any free space, how PostgreSQL was able to successfully recover and start up again?
When PostgreSQL finishes its recovery process, a bunch of free space was cleared up, this may be because PostgreSQL was reading data from WAL files, applying them to the tables' data, and then deleting the WAL files, which ends up freeing disk space, and that allows PostgreSQL to continue working until the disk fills up again.
Anyhow, if your PostgreSQL is randomly going into recovery for no apparent reason, don't forget to check your available free disk space!