2024-04-22 03:32:29

by Lex Siegel

[permalink] [raw]
Subject: Re: [PATCH] xprtsock: Fix a loop in xs_tcp_setup_socket()

> Better still would be for kernel_connect() to return a more normal error
> code - not EPERM. If that cannot be achieved, then I think it would be
> best for the sunrpc code to map EPERM to something else at the place
> where kernel_connect() is called - catch it early.

The question is whether a permission error, EPERM, should cause a retry or
return. Currently xs_tcp_setup_socket() is retrying. For the retry to clear,
the connect call will have to not return a permission error to halt the retry
attempts.

This is a default behavior because EPERM is not an explicit case of the switch
statement. Because bpf appropriately uses EPERM to show that the kernel_connect
was not permitted, it highlights the return handling for this case is missing.
It is unlikely that retry was ever the intended result.

Upstream, the bpf that caused this is at:
https://github.com/cilium/cilium/blob/v1.15/bpf/bpf_sock.c#L336

This cilium bpf code has two return statuses, EPERM and ENXIO, that fall
through to the default case of retrying. Here, cilium expects both of these
statuses to indicate the connect failed. A retry is not the intended result.

Handling this case without a retry aligns this code with the udp behavior. This
precedence for passing EPERM back up the stack was set in 3dedbb5ca10ef.

I will amend my patch to include an explicit case for ENXIO as well, as this is
also in cilium's bpf and will cause the same bug to occur.


On Mon, Apr 22, 2024 at 8:22 AM NeilBrown <[email protected]> wrote:
>
> On Sat, 20 Apr 2024, Lex Siegel wrote:
> > When using a bpf on kernel_connect(), the call can return -EPERM.
> > This causes xs_tcp_setup_socket() to loop forever, filling up the
> > syslog and causing the kernel to freeze up.
> >
> > Signed-off-by: Lex Siegel <[email protected]>
> > ---
> > net/sunrpc/xprtsock.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index bb9b747d58a1..47b254806a08 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> > /* Happens, for instance, if the user specified a link
> > * local IPv6 address without a scope-id.
> > */
> > + case -EPERM:
> > + /* Happens, for instance, if a bpf is preventing the connect */
>
> This will propagate -EPERM up into other layers which might not be ready
> to handle it.
> It might be safer to map EPERM to an error we would be more likely to
> expect from the network system - such as ECONNREFUSED or ENETDOWN.
>
> Better still would be for kernel_connect() to return a more normal error
> code - not EPERM. If that cannot be achieved, then I think it would be
> best for the sunrpc code to map EPERM to something else at the place
> where kernel_connect() is called - catch it early.
>
> NeilBrown
>
>
> > case -ECONNREFUSED:
> > case -ECONNRESET:
> > case -ENETDOWN:
> > --
> > 2.39.3
> >
> >
>