Socket deadlocks if no HELLO is received
Summary
If the WebSocket connection to Discord dies and no OP 10 (HELLO) is ever received, the client will deadlock when reconnecting.
Details
As per @Quahu, the client will never reconnect if CloudFlare closes the WS connection with 1001 in rapid succession.
Before ConnectAsync
is called, the IDENTIFY semaphore is entered. Every subsequent ConnectAsync
call will wait for that semaphore to be released. When the WS connection receives OP 10, it sends IDENTIFY or RESUME (depending on session state), then sets off a delay of 5100ms which resets the semaphore. Now, if HELLO is never received, the semaphore is never reset.
The fix is to make sure the delay is set off even if the HELLO is not received.
Steps to reproduce
- Connect
- Cause CF to rapidly disconnect you with any code (make sure no HELLO is received)
- The client will no longer connect
Notes
The same deadlock might occur if the ConnectAsync
call throws for any reason. This might be tested by trying without internet at the time of connecting to WS (but after all initialization) happens.
There's also apparently a bug where OP 9 (INVALID SESSION) might be sent with d
indicating the session is resumable when it is, in fact, not (typically outages). This can potentially lead to deadlocks (however I have not yet ran into this, and my bot survived several outages). Foxbot implemented a "fix" in D.NET where OP 9 just starts a new session regardless of resumability.
This fix should be backported to 3.2.