PostgreSQL JDBC driver forever blocked on getNotifications(0) if a network failure happens

Publicado às 18/08/2022 16:47 • Última Atualização às 18/08/2022 19:50 • #kotlin #postgresql #tutorial

I needed to reload my network interface on the machine where my PostgreSQL database is hosted and, after doing that, my thread got forever blocked on the getNotifications(...) call.

"Loritta PostgreSQL Notification Listener" #261 daemon prio=5 os_prio=0 cpu=48.89ms elapsed=62372.91s tid=0x00007f45f806a460 nid=0xf08b5 runnable  [0x00007f45d6dfc000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.Net.poll([email protected]/Native Method)
        at sun.nio.ch.NioSocketImpl.park([email protected]/NioSocketImpl.java:181)
        at sun.nio.ch.NioSocketImpl.timedRead([email protected]/NioSocketImpl.java:285)
        at sun.nio.ch.NioSocketImpl.implRead([email protected]/NioSocketImpl.java:309)
        at sun.nio.ch.NioSocketImpl.read([email protected]/NioSocketImpl.java:350)
        at sun.nio.ch.NioSocketImpl$1.read([email protected]/NioSocketImpl.java:803)
        at java.net.Socket$SocketInputStream.read([email protected]/Socket.java:966)
        at sun.security.ssl.SSLSocketInputRecord.read([email protected]/SSLSocketInputRecord.java:478)
        at sun.security.ssl.SSLSocketInputRecord.readHeader([email protected]/SSLSocketInputRecord.java:472)
        at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket([email protected]/SSLSocketInputRecord.java:70)
        at sun.security.ssl.SSLSocketImpl.readApplicationRecord([email protected]/SSLSocketImpl.java:1455)
        at sun.security.ssl.SSLSocketImpl$AppInputStream.read([email protected]/SSLSocketImpl.java:1059)
        at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
        at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)
        at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:113)
        at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73)
        at org.postgresql.core.PGStream.receiveChar(PGStream.java:453)
        at org.postgresql.core.v3.QueryExecutorImpl.processNotifies(QueryExecutorImpl.java:789)
        - locked <0x0000000621293410> (a org.postgresql.core.v3.QueryExecutorImpl)
        at org.postgresql.jdbc.PgConnection.getNotifications(PgConnection.java:1107)
        at net.perfectdreams.loritta.cinnamon.pudding.utils.PostgreSQLNotificationListener.run(PostgreSQLNotificationListener.kt:29)
        at java.lang.Thread.run([email protected]/Thread.java:833)

So I went out and tried figuring out if this could be reproduced, and found out that someone had already reported this issue, but it was closed due to "lack of feedback". Anyhow, here's my investigations and explanations to anyone else wondering why their PostgreSQL getNotifications call is not receiving new notifications, even tho it is blocked on the getNotifications call!

To debug this, I ran a PostgreSQL container using Docker, so we can abuse the docker pause feature to simulate a network failure. (because blocking network connections on Windows is hard)

docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres

fun main() {
    // 172.24.48.1 = WSL2 IP
    val url = "jdbc:postgresql://172.24.61.21:55432/postgres?user=postgres&password=mysecretpassword"
    val connection = DriverManager.getConnection(url) as PgConnection

    while (true) {
        // I know that we aren't listening for any channels, this doesn't matter
        println("Blocked waiting for notifications")
        connection.getNotifications(0)
    }
}

If we shutdown PostgreSQL with docker stop some-postgres...

Blocked waiting for notifications
Exception in thread "main" org.postgresql.util.PSQLException: FATAL: terminating connection due to administrator command
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675)
	at org.postgresql.core.v3.QueryExecutorImpl.processNotifies(QueryExecutorImpl.java:800)
	at org.postgresql.jdbc.PgConnection.getNotifications(PgConnection.java:1107)
	at net.perfectdreams.loritta.cinnamon.discord.webserver.TestKt.main(Test.kt:16)
	at net.perfectdreams.loritta.cinnamon.discord.webserver.TestKt.main(Test.kt)

Okay, so, what if we docker pause some-postgres?

Blocked waiting for notifications

Hmmm... so the thread keeps blocking on the getNotifications(0) call, interesting...

Let's unpause the container with docker unpause some-postgres and add a notification timeout to the getNotifications(...) call

fun main() {
    // 172.24.48.1 = WSL2 IP
    val url = "jdbc:postgresql://172.24.61.21:55432/postgres?user=postgres&password=mysecretpassword"
    val connection = DriverManager.getConnection(url) as PgConnection

    while (true) {
        // I know that we aren't listening for any channels, this doesn't matter
        println("Blocked waiting for notifications")
        connection.getNotifications(5_000)
    }
}

Let's run it, pause the container with docker pause some-postgres...

Blocked waiting for notifications
Blocked waiting for notifications
Blocked waiting for notifications
Blocked waiting for notifications
Blocked waiting for notifications
Blocked waiting for notifications
Blocked waiting for notifications
Blocked waiting for notifications
Blocked waiting for notifications
Blocked waiting for notifications
Blocked waiting for notifications
Blocked waiting for notifications

Hmmm, okay, so it keeps looping the while (true) check, even tho the PostgreSQL database is not active, so now what?

@ioannisioannou16 recommended using the isValid method to check if the connection is open AND adding a timeout to the getNotifications call, so let's do it!

fun main() {
    // 172.24.48.1 = WSL2 IP
    val url = "jdbc:postgresql://172.24.61.21:55432/postgres?user=postgres&password=mysecretpassword"
    val connection = DriverManager.getConnection(url) as PgConnection

    while (true) {
        println("Checking if the connection is valid...")
        val isValid = connection.isValid(5) // This is in seconds!!
        println("Connection is valid? $isValid")
        if (!isValid)
            break

        // I know that we aren't listening for any channels, this doesn't matter
        println("Blocked waiting for notifications")

        val notifications = connection.getNotifications(5_000)

        println("Notifications: $notifications")
    }
}

And now it works: If the container is paused while the getNotifications(...) call is active, it will unblock after a while and then run the loop again, where then the isValid call will wait 5s until the connection is considered "valid" and, if it isn't, the loop will exit.

Checking if the connection is valid...
Connection is valid? true
Blocked waiting for notifications
Notifications: [Lorg.postgresql.PGNotification;@77846d2c
Checking if the connection is valid...
Connection is valid? true
Blocked waiting for notifications
Notifications: [Lorg.postgresql.PGNotification;@548ad73b
Checking if the connection is valid...
Connection is valid? false

And now you may be wondering: Why getNotifications(...) doesn't fail if a network issue happens? If we are having a network issue, we don't receive a notification from PostgreSQL saying "hey i'm shutting down ok pls disconnect", in fact, the connection receives ✨ nothing at all ✨!

And notifications are received asynchronously, in fact, the JDBC driver doesn't send any queries to the PostgreSQL database indicating that "hey, pls send new notifications to me!!" when calling getNotifications(...), it just blocks until any data is received from the database.

So that's where the issue lies:

  • The driver waits for new data to be received, but because there is a network issue, we never receive new data from the database
  • The connection doesn't know that it is dead until we try sending data thru the connection, and getNotifications(...) doesn't send any data through the connection.

So a quick and dirty fix is doing what @ioannisioannou16 suggested: Add a timeout to getNotifications(...), call isValid(...) with a timeout too, if isValid(...) returns false, then it means that the connection is dead.

The disadvantage of this solution is that the thread needs to be unblocked every once in a while to check if the connection is still alive.

Maybe the jdbc docs should be updated to mention this issue? 😊 https://jdbc.postgresql.org/documentation/head/listennotify.html


Revisões da Postagem

6b192fcd 18/08/2022 19:50 Update CSS because the current one is ugly as fuc and has font sizes that are too large
509d1c9d 18/08/2022 17:03 Update en-us.md
dbf5477d 18/08/2022 16:47 Add PostgreSQL getNotifications issue post