2023-12-07 15:26:13

by Benjamin Coddington

[permalink] [raw]
Subject: spurious backchannel -ETIMEDOUT and reset on v6.7-rc1

Hey Trond,

I've been noticing slower than usual xfstests, and in the logs:

"RPC: Could not send backchannel reply error: -110"

Since "59464b262ff5 SUNRPC: SOFTCONN tasks should time out when on the
sending list", some backchannel reqs will immediately reset the connection
if they need to sleep on ->sending in xprt_reserve_xprt.

I don't think we set up rq_timeout and rq_majortimeout for backchannel reqs,
so they immediately fail with -ETIMEDOUT.

I'm hunting around for the best fix, but maybe you've got one I can test.

Ben



2023-12-07 15:48:34

by Benjamin Coddington

[permalink] [raw]
Subject: Re: spurious backchannel -ETIMEDOUT and reset on v6.7-rc1

On 7 Dec 2023, at 10:25, Benjamin Coddington wrote:

> Hey Trond,
>
> I've been noticing slower than usual xfstests, and in the logs:
>
> "RPC: Could not send backchannel reply error: -110"
>
> Since "59464b262ff5 SUNRPC: SOFTCONN tasks should time out when on the
> sending list", some backchannel reqs will immediately reset the connection
> if they need to sleep on ->sending in xprt_reserve_xprt.
>
> I don't think we set up rq_timeout and rq_majortimeout for backchannel reqs,
> so they immediately fail with -ETIMEDOUT.
>
> I'm hunting around for the best fix, but maybe you've got one I can test.

Assuming we want backchannel reqs to actually check/timeout/reset, I think
its looking like we need to do a version of xprt_init_majortimeo() for every
xprt_get_bc_request()..

Ben