2022-05-24 13:19:19

by Muhammad Usama Anjum

[permalink] [raw]
Subject: [RFC] EADDRINUSE from bind() on application restart after killing

Hello,

We have a set of processes which talk with each other through a local
TCP socket. If the process(es) are killed (through SIGKILL) and
restarted at once, the bind() fails with EADDRINUSE error. This error
only appears if application is restarted at once without waiting for 60
seconds or more. It seems that there is some timeout of 60 seconds for
which the previous TCP connection remains alive waiting to get closed
completely. In that duration if we try to connect again, we get the error.

We are able to avoid this error by adding SO_REUSEADDR attribute to the
socket in a hack. But this hack cannot be added to the application
process as we don't own it.

I've looked at the TCP connection states after killing processes in
different ways. The TCP connection ends up in 2 different states with
timeouts:

(1) Timeout associated with FIN_WAIT_1 state which is set through
`tcp_fin_timeout` in procfs (60 seconds by default)

(2) Timeout associated with TIME_WAIT state which cannot be changed. It
seems like this timeout has come from RFC 1337.

The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
also doesn't seem feasible to change the timeout of TIME_WAIT state as
the RFC mentions several hazards. But we are talking about a local TCP
connection where maybe those hazards aren't applicable directly? Is it
possible to change timeout for TIME_WAIT state for only local
connections without any hazards?

We have tested a hack where we replace timeout of TIME_WAIT state from a
value in procfs for local connections. This solves our problem and
application starts to work without any modifications to it.

The question is that what can be the best possible solution here? Any
thoughts will be very helpful.

Regards,

--
Muhammad Usama Anjum


2022-05-25 12:36:34

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

On Tue, May 24, 2022 at 1:19 AM Muhammad Usama Anjum
<[email protected]> wrote:
>
> Hello,
>
> We have a set of processes which talk with each other through a local
> TCP socket. If the process(es) are killed (through SIGKILL) and
> restarted at once, the bind() fails with EADDRINUSE error. This error
> only appears if application is restarted at once without waiting for 60
> seconds or more. It seems that there is some timeout of 60 seconds for
> which the previous TCP connection remains alive waiting to get closed
> completely. In that duration if we try to connect again, we get the error.
>
> We are able to avoid this error by adding SO_REUSEADDR attribute to the
> socket in a hack. But this hack cannot be added to the application
> process as we don't own it.
>
> I've looked at the TCP connection states after killing processes in
> different ways. The TCP connection ends up in 2 different states with
> timeouts:
>
> (1) Timeout associated with FIN_WAIT_1 state which is set through
> `tcp_fin_timeout` in procfs (60 seconds by default)
>
> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
> seems like this timeout has come from RFC 1337.
>
> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
> also doesn't seem feasible to change the timeout of TIME_WAIT state as
> the RFC mentions several hazards. But we are talking about a local TCP
> connection where maybe those hazards aren't applicable directly? Is it
> possible to change timeout for TIME_WAIT state for only local
> connections without any hazards?
>
> We have tested a hack where we replace timeout of TIME_WAIT state from a
> value in procfs for local connections. This solves our problem and
> application starts to work without any modifications to it.
>
> The question is that what can be the best possible solution here? Any
> thoughts will be very helpful.
>

One solution would be to extend TCP diag to support killing TIME_WAIT sockets.
(This has been raised recently anyway)

Then you could zap all sockets, before re-starting your program.

ss -K -ta src :listen_port

Untested patch:

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9984d23a7f3e1353d2e1fc9053d98c77268c577e..1b7bde889096aa800b2994c64a3a68edf3b62434
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4519,6 +4519,15 @@ int tcp_abort(struct sock *sk, int err)
local_bh_enable();
return 0;
}
+ if (sk->sk_state == TCP_TIME_WAIT) {
+ struct inet_timewait_sock *tw = inet_twsk(sk);
+
+ refcount_inc(&tw->tw_refcnt);
+ local_bh_disable();
+ inet_twsk_deschedule_put(tw);
+ local_bh_enable();
+ return 0;
+ }
return -EOPNOTSUPP;
}

2022-05-30 13:48:52

by Muhammad Usama Anjum

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

Hi,

Thank you for your reply.

On 5/25/22 3:13 AM, Eric Dumazet wrote:
> On Tue, May 24, 2022 at 1:19 AM Muhammad Usama Anjum
> <[email protected]> wrote:
>>
>> Hello,
>>
>> We have a set of processes which talk with each other through a local
>> TCP socket. If the process(es) are killed (through SIGKILL) and
>> restarted at once, the bind() fails with EADDRINUSE error. This error
>> only appears if application is restarted at once without waiting for 60
>> seconds or more. It seems that there is some timeout of 60 seconds for
>> which the previous TCP connection remains alive waiting to get closed
>> completely. In that duration if we try to connect again, we get the error.
>>
>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
>> socket in a hack. But this hack cannot be added to the application
>> process as we don't own it.
>>
>> I've looked at the TCP connection states after killing processes in
>> different ways. The TCP connection ends up in 2 different states with
>> timeouts:
>>
>> (1) Timeout associated with FIN_WAIT_1 state which is set through
>> `tcp_fin_timeout` in procfs (60 seconds by default)
>>
>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
>> seems like this timeout has come from RFC 1337.
>>
>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
>> also doesn't seem feasible to change the timeout of TIME_WAIT state as
>> the RFC mentions several hazards. But we are talking about a local TCP
>> connection where maybe those hazards aren't applicable directly? Is it
>> possible to change timeout for TIME_WAIT state for only local
>> connections without any hazards?
>>
>> We have tested a hack where we replace timeout of TIME_WAIT state from a
>> value in procfs for local connections. This solves our problem and
>> application starts to work without any modifications to it.
>>
>> The question is that what can be the best possible solution here? Any
>> thoughts will be very helpful.
>>
>
> One solution would be to extend TCP diag to support killing TIME_WAIT sockets.
> (This has been raised recently anyway)
I think this has been raised here:
https://lore.kernel.org/netdev/[email protected]/

>
> Then you could zap all sockets, before re-starting your program.
>
> ss -K -ta src :listen_port
>
> Untested patch:
The following command and patch work for my use case. The socket in
TIME_WAIT_2 or TIME_WAIT state are closed when zapped.

Can you please upstream this patch?

>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 9984d23a7f3e1353d2e1fc9053d98c77268c577e..1b7bde889096aa800b2994c64a3a68edf3b62434
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4519,6 +4519,15 @@ int tcp_abort(struct sock *sk, int err)
> local_bh_enable();
> return 0;
> }
> + if (sk->sk_state == TCP_TIME_WAIT) {
> + struct inet_timewait_sock *tw = inet_twsk(sk);
> +
> + refcount_inc(&tw->tw_refcnt);
> + local_bh_disable();
> + inet_twsk_deschedule_put(tw);
> + local_bh_enable();
> + return 0;
> + }
> return -EOPNOTSUPP;
> }

--
Muhammad Usama Anjum

2022-06-01 18:29:29

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

On Mon, May 30, 2022 at 6:15 AM Muhammad Usama Anjum
<[email protected]> wrote:
>
> Hi,
>
> Thank you for your reply.
>
> On 5/25/22 3:13 AM, Eric Dumazet wrote:
> > On Tue, May 24, 2022 at 1:19 AM Muhammad Usama Anjum
> > <[email protected]> wrote:
> >>
> >> Hello,
> >>
> >> We have a set of processes which talk with each other through a local
> >> TCP socket. If the process(es) are killed (through SIGKILL) and
> >> restarted at once, the bind() fails with EADDRINUSE error. This error
> >> only appears if application is restarted at once without waiting for 60
> >> seconds or more. It seems that there is some timeout of 60 seconds for
> >> which the previous TCP connection remains alive waiting to get closed
> >> completely. In that duration if we try to connect again, we get the error.
> >>
> >> We are able to avoid this error by adding SO_REUSEADDR attribute to the
> >> socket in a hack. But this hack cannot be added to the application
> >> process as we don't own it.
> >>
> >> I've looked at the TCP connection states after killing processes in
> >> different ways. The TCP connection ends up in 2 different states with
> >> timeouts:
> >>
> >> (1) Timeout associated with FIN_WAIT_1 state which is set through
> >> `tcp_fin_timeout` in procfs (60 seconds by default)
> >>
> >> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
> >> seems like this timeout has come from RFC 1337.
> >>
> >> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
> >> also doesn't seem feasible to change the timeout of TIME_WAIT state as
> >> the RFC mentions several hazards. But we are talking about a local TCP
> >> connection where maybe those hazards aren't applicable directly? Is it
> >> possible to change timeout for TIME_WAIT state for only local
> >> connections without any hazards?
> >>
> >> We have tested a hack where we replace timeout of TIME_WAIT state from a
> >> value in procfs for local connections. This solves our problem and
> >> application starts to work without any modifications to it.
> >>
> >> The question is that what can be the best possible solution here? Any
> >> thoughts will be very helpful.
> >>
> >
> > One solution would be to extend TCP diag to support killing TIME_WAIT sockets.
> > (This has been raised recently anyway)
> I think this has been raised here:
> https://lore.kernel.org/netdev/[email protected]/
>
> >
> > Then you could zap all sockets, before re-starting your program.
> >
> > ss -K -ta src :listen_port
> >
> > Untested patch:
> The following command and patch work for my use case. The socket in
> TIME_WAIT_2 or TIME_WAIT state are closed when zapped.
>
> Can you please upstream this patch?

Yes, I will when net-next reopens, thanks for testing it.

>
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 9984d23a7f3e1353d2e1fc9053d98c77268c577e..1b7bde889096aa800b2994c64a3a68edf3b62434
> > 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -4519,6 +4519,15 @@ int tcp_abort(struct sock *sk, int err)
> > local_bh_enable();
> > return 0;
> > }
> > + if (sk->sk_state == TCP_TIME_WAIT) {
> > + struct inet_timewait_sock *tw = inet_twsk(sk);
> > +
> > + refcount_inc(&tw->tw_refcnt);
> > + local_bh_disable();
> > + inet_twsk_deschedule_put(tw);
> > + local_bh_enable();
> > + return 0;
> > + }
> > return -EOPNOTSUPP;
> > }
>
> --
> Muhammad Usama Anjum

2022-06-27 11:00:40

by Muhammad Usama Anjum

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

Hi Eric,

On 5/30/22 8:28 PM, Eric Dumazet wrote:
>> The following command and patch work for my use case. The socket in
>> TIME_WAIT_2 or TIME_WAIT state are closed when zapped.
>>
>> Can you please upstream this patch?
> Yes, I will when net-next reopens, thanks for testing it.
Have you tried upstreaming it?

Tested-by: Muhammad Usama Anjum <[email protected]>

>
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 9984d23a7f3e1353d2e1fc9053d98c77268c577e..1b7bde889096aa800b2994c64a3a68edf3b62434
>>> 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -4519,6 +4519,15 @@ int tcp_abort(struct sock *sk, int err)
>>> local_bh_enable();
>>> return 0;
>>> }
>>> + if (sk->sk_state == TCP_TIME_WAIT) {
>>> + struct inet_timewait_sock *tw = inet_twsk(sk);
>>> +
>>> + refcount_inc(&tw->tw_refcnt);
>>> + local_bh_disable();
>>> + inet_twsk_deschedule_put(tw);
>>> + local_bh_enable();
>>> + return 0;
>>> + }
>>> return -EOPNOTSUPP;
>>> }

--
Muhammad Usama Anjum

2022-06-27 12:25:04

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

On Mon, Jun 27, 2022 at 12:20 PM Muhammad Usama Anjum
<[email protected]> wrote:
>
> Hi Eric,
>
> On 5/30/22 8:28 PM, Eric Dumazet wrote:
> >> The following command and patch work for my use case. The socket in
> >> TIME_WAIT_2 or TIME_WAIT state are closed when zapped.
> >>
> >> Can you please upstream this patch?
> > Yes, I will when net-next reopens, thanks for testing it.
> Have you tried upstreaming it?
>
> Tested-by: Muhammad Usama Anjum <[email protected]>
>

I will do this today, thanks for the heads up.


> >
> >>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>> index 9984d23a7f3e1353d2e1fc9053d98c77268c577e..1b7bde889096aa800b2994c64a3a68edf3b62434
> >>> 100644
> >>> --- a/net/ipv4/tcp.c
> >>> +++ b/net/ipv4/tcp.c
> >>> @@ -4519,6 +4519,15 @@ int tcp_abort(struct sock *sk, int err)
> >>> local_bh_enable();
> >>> return 0;
> >>> }
> >>> + if (sk->sk_state == TCP_TIME_WAIT) {
> >>> + struct inet_timewait_sock *tw = inet_twsk(sk);
> >>> +
> >>> + refcount_inc(&tw->tw_refcnt);
> >>> + local_bh_disable();
> >>> + inet_twsk_deschedule_put(tw);
> >>> + local_bh_enable();
> >>> + return 0;
> >>> + }
> >>> return -EOPNOTSUPP;
> >>> }
>
> --
> Muhammad Usama Anjum

2022-09-30 14:03:28

by Muhammad Usama Anjum

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

Hi Eric,

RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because
of this hazard we have 60 seconds timeout in TIME_WAIT state if
connection isn't closed properly. From RFC 1337:
> The TIME-WAIT delay allows all old duplicate segments time
enough to die in the Internet before the connection is reopened.

As on localhost there is virtually no delay. I think the TIME-WAIT delay
must be zero for localhost connections. I'm no expert here. On localhost
there is no delay. So why should we wait for 60 seconds to mitigate a
hazard which isn't there?

Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But
zap is required from privileged (CAP_NET_ADMIN) process. We are having
hard time finding a privileged process to do this.

Thanks,
Usama


On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote:
> Hello,
>
> We have a set of processes which talk with each other through a local
> TCP socket. If the process(es) are killed (through SIGKILL) and
> restarted at once, the bind() fails with EADDRINUSE error. This error
> only appears if application is restarted at once without waiting for 60
> seconds or more. It seems that there is some timeout of 60 seconds for
> which the previous TCP connection remains alive waiting to get closed
> completely. In that duration if we try to connect again, we get the error.
>
> We are able to avoid this error by adding SO_REUSEADDR attribute to the
> socket in a hack. But this hack cannot be added to the application
> process as we don't own it.
>
> I've looked at the TCP connection states after killing processes in
> different ways. The TCP connection ends up in 2 different states with
> timeouts:
>
> (1) Timeout associated with FIN_WAIT_1 state which is set through
> `tcp_fin_timeout` in procfs (60 seconds by default)
>
> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
> seems like this timeout has come from RFC 1337.
>
> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
> also doesn't seem feasible to change the timeout of TIME_WAIT state as
> the RFC mentions several hazards. But we are talking about a local TCP
> connection where maybe those hazards aren't applicable directly? Is it
> possible to change timeout for TIME_WAIT state for only local
> connections without any hazards?
>
> We have tested a hack where we replace timeout of TIME_WAIT state from a
> value in procfs for local connections. This solves our problem and
> application starts to work without any modifications to it.
>
> The question is that what can be the best possible solution here? Any
> thoughts will be very helpful.
>
> Regards,
>

--
Muhammad Usama Anjum

2022-09-30 15:21:51

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum
<[email protected]> wrote:
>
> Hi Eric,
>
> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because
> of this hazard we have 60 seconds timeout in TIME_WAIT state if
> connection isn't closed properly. From RFC 1337:
> > The TIME-WAIT delay allows all old duplicate segments time
> enough to die in the Internet before the connection is reopened.
>
> As on localhost there is virtually no delay. I think the TIME-WAIT delay
> must be zero for localhost connections. I'm no expert here. On localhost
> there is no delay. So why should we wait for 60 seconds to mitigate a
> hazard which isn't there?

Because we do not specialize TCP stack for loopback.

It is easy to force delays even for loopback (tc qdisc add dev lo root
netem ...)

You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead.

TIME_WAIT sockets are optional.
If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ?

>
> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But
> zap is required from privileged (CAP_NET_ADMIN) process. We are having
> hard time finding a privileged process to do this.

Really, we are not going to add kludges in TCP stacks because of this reason.

>
> Thanks,
> Usama
>
>
> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote:
> > Hello,
> >
> > We have a set of processes which talk with each other through a local
> > TCP socket. If the process(es) are killed (through SIGKILL) and
> > restarted at once, the bind() fails with EADDRINUSE error. This error
> > only appears if application is restarted at once without waiting for 60
> > seconds or more. It seems that there is some timeout of 60 seconds for
> > which the previous TCP connection remains alive waiting to get closed
> > completely. In that duration if we try to connect again, we get the error.
> >
> > We are able to avoid this error by adding SO_REUSEADDR attribute to the
> > socket in a hack. But this hack cannot be added to the application
> > process as we don't own it.
> >
> > I've looked at the TCP connection states after killing processes in
> > different ways. The TCP connection ends up in 2 different states with
> > timeouts:
> >
> > (1) Timeout associated with FIN_WAIT_1 state which is set through
> > `tcp_fin_timeout` in procfs (60 seconds by default)
> >
> > (2) Timeout associated with TIME_WAIT state which cannot be changed. It
> > seems like this timeout has come from RFC 1337.
> >
> > The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
> > also doesn't seem feasible to change the timeout of TIME_WAIT state as
> > the RFC mentions several hazards. But we are talking about a local TCP
> > connection where maybe those hazards aren't applicable directly? Is it
> > possible to change timeout for TIME_WAIT state for only local
> > connections without any hazards?
> >
> > We have tested a hack where we replace timeout of TIME_WAIT state from a
> > value in procfs for local connections. This solves our problem and
> > application starts to work without any modifications to it.
> >
> > The question is that what can be the best possible solution here? Any
> > thoughts will be very helpful.
> >
> > Regards,
> >
>
> --
> Muhammad Usama Anjum

2022-10-14 16:59:04

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

On Fri, Oct 14, 2022 at 8:52 AM Paul Gofman <[email protected]> wrote:
>
> Hello Eric,
>
> our problem is actually not with the accept socket / port for which
> those timeouts apply, we don't care for that temporary port number. The
> problem is that the listen port (to which apps bind explicitly) is also
> busy until the accept socket waits through all the necessary timeouts
> and is fully closed. From my reading of TCP specs I don't understand why
> it should be this way. The TCP hazards stipulating those timeouts seem
> to apply to accept (connection) socket / port only. Shouldn't listen
> socket's port (the only one we care about) be available for bind
> immediately after the app stops listening on it (either due to closing
> the listen socket or process force kill), or maybe have some other
> timeouts not related to connected accept socket / port hazards? Or am I
> missing something why it should be the way it is done now?
>


To quote your initial message :

<quote>
We are able to avoid this error by adding SO_REUSEADDR attribute to the
socket in a hack. But this hack cannot be added to the application
process as we don't own it.
</quote>

Essentially you are complaining of the linux kernel being unable to
run a buggy application.

We are not going to change the linux kernel because you can not
fix/recompile an application.

Note that you could use LD_PRELOAD, or maybe eBPF to automatically
turn SO_REUSEADDR before bind()


> Thanks,
> Paul.
>
>
> On 9/30/22 10:16, Eric Dumazet wrote:
> > On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum
> > <[email protected]> wrote:
> >> Hi Eric,
> >>
> >> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because
> >> of this hazard we have 60 seconds timeout in TIME_WAIT state if
> >> connection isn't closed properly. From RFC 1337:
> >>> The TIME-WAIT delay allows all old duplicate segments time
> >> enough to die in the Internet before the connection is reopened.
> >>
> >> As on localhost there is virtually no delay. I think the TIME-WAIT delay
> >> must be zero for localhost connections. I'm no expert here. On localhost
> >> there is no delay. So why should we wait for 60 seconds to mitigate a
> >> hazard which isn't there?
> > Because we do not specialize TCP stack for loopback.
> >
> > It is easy to force delays even for loopback (tc qdisc add dev lo root
> > netem ...)
> >
> > You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead.
> >
> > TIME_WAIT sockets are optional.
> > If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ?
> >
> >> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But
> >> zap is required from privileged (CAP_NET_ADMIN) process. We are having
> >> hard time finding a privileged process to do this.
> > Really, we are not going to add kludges in TCP stacks because of this reason.
> >
> >> Thanks,
> >> Usama
> >>
> >>
> >> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote:
> >>> Hello,
> >>>
> >>> We have a set of processes which talk with each other through a local
> >>> TCP socket. If the process(es) are killed (through SIGKILL) and
> >>> restarted at once, the bind() fails with EADDRINUSE error. This error
> >>> only appears if application is restarted at once without waiting for 60
> >>> seconds or more. It seems that there is some timeout of 60 seconds for
> >>> which the previous TCP connection remains alive waiting to get closed
> >>> completely. In that duration if we try to connect again, we get the error.
> >>>
> >>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
> >>> socket in a hack. But this hack cannot be added to the application
> >>> process as we don't own it.
> >>>
> >>> I've looked at the TCP connection states after killing processes in
> >>> different ways. The TCP connection ends up in 2 different states with
> >>> timeouts:
> >>>
> >>> (1) Timeout associated with FIN_WAIT_1 state which is set through
> >>> `tcp_fin_timeout` in procfs (60 seconds by default)
> >>>
> >>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
> >>> seems like this timeout has come from RFC 1337.
> >>>
> >>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
> >>> also doesn't seem feasible to change the timeout of TIME_WAIT state as
> >>> the RFC mentions several hazards. But we are talking about a local TCP
> >>> connection where maybe those hazards aren't applicable directly? Is it
> >>> possible to change timeout for TIME_WAIT state for only local
> >>> connections without any hazards?
> >>>
> >>> We have tested a hack where we replace timeout of TIME_WAIT state from a
> >>> value in procfs for local connections. This solves our problem and
> >>> application starts to work without any modifications to it.
> >>>
> >>> The question is that what can be the best possible solution here? Any
> >>> thoughts will be very helpful.
> >>>
> >>> Regards,
> >>>
> >> --
> >> Muhammad Usama Anjum
>
>

2022-10-14 17:04:57

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

On Fri, Oct 14, 2022 at 9:39 AM Paul Gofman <[email protected]> wrote:
>
> Sorry if I was unclear, to reformulate my question, is blocking
> listening port (not the accept one) this way a IETF requirement? I am
> asking because I could not find where such a requirement stems from
> there. Sorry if I am missing the obvious.

I think it is documented.

man 7 socket

SO_REUSEADDR
Indicates that the rules used in validating addresses
supplied in a bind(2) call should allow reuse of local addresses. For
AF_INET sockets this means
that a socket may bind, except when there is an active
listening socket bound to the address. When the listening socket is
bound to INADDR_ANY with a
specific port then it is not possible to bind to this
port for any local address. Argument is an integer boolean flag.

You seem to need another way, so you will have to ask this question in IETF.

>
> On 10/14/22 11:34, Eric Dumazet wrote:
> >> My question is if the behaviour of blocking listen socket port
> >> while the accepted port (which, as I understand, does not have any
> >> direct relation to listen port anymore from TCP standpoint) is still in
> >> TIME_ or other wait is stipulated by TCP requirements which I am
> >> missing? Or, if not, maybe that can be changed?
> >>
> > Please raise these questions at IETF, this is where major TCP changes
> > need to be approved.
> >
> > There are multiple ways to avoid TIME_WAIT, if you really need to.
> >
> >
> >> Thanks,
> >> Paul.
> >>
> >>
> >> On 10/14/22 11:20, Eric Dumazet wrote:
> >>> On Fri, Oct 14, 2022 at 8:52 AM Paul Gofman <[email protected]> wrote:
> >>>> Hello Eric,
> >>>>
> >>>> our problem is actually not with the accept socket / port for which
> >>>> those timeouts apply, we don't care for that temporary port number. The
> >>>> problem is that the listen port (to which apps bind explicitly) is also
> >>>> busy until the accept socket waits through all the necessary timeouts
> >>>> and is fully closed. From my reading of TCP specs I don't understand why
> >>>> it should be this way. The TCP hazards stipulating those timeouts seem
> >>>> to apply to accept (connection) socket / port only. Shouldn't listen
> >>>> socket's port (the only one we care about) be available for bind
> >>>> immediately after the app stops listening on it (either due to closing
> >>>> the listen socket or process force kill), or maybe have some other
> >>>> timeouts not related to connected accept socket / port hazards? Or am I
> >>>> missing something why it should be the way it is done now?
> >>>>
> >>> To quote your initial message :
> >>>
> >>> <quote>
> >>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
> >>> socket in a hack. But this hack cannot be added to the application
> >>> process as we don't own it.
> >>> </quote>
> >>>
> >>> Essentially you are complaining of the linux kernel being unable to
> >>> run a buggy application.
> >>>
> >>> We are not going to change the linux kernel because you can not
> >>> fix/recompile an application.
> >>>
> >>> Note that you could use LD_PRELOAD, or maybe eBPF to automatically
> >>> turn SO_REUSEADDR before bind()
> >>>
> >>>
> >>>> Thanks,
> >>>> Paul.
> >>>>
> >>>>
> >>>> On 9/30/22 10:16, Eric Dumazet wrote:
> >>>>> On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum
> >>>>> <[email protected]> wrote:
> >>>>>> Hi Eric,
> >>>>>>
> >>>>>> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because
> >>>>>> of this hazard we have 60 seconds timeout in TIME_WAIT state if
> >>>>>> connection isn't closed properly. From RFC 1337:
> >>>>>>> The TIME-WAIT delay allows all old duplicate segments time
> >>>>>> enough to die in the Internet before the connection is reopened.
> >>>>>>
> >>>>>> As on localhost there is virtually no delay. I think the TIME-WAIT delay
> >>>>>> must be zero for localhost connections. I'm no expert here. On localhost
> >>>>>> there is no delay. So why should we wait for 60 seconds to mitigate a
> >>>>>> hazard which isn't there?
> >>>>> Because we do not specialize TCP stack for loopback.
> >>>>>
> >>>>> It is easy to force delays even for loopback (tc qdisc add dev lo root
> >>>>> netem ...)
> >>>>>
> >>>>> You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead.
> >>>>>
> >>>>> TIME_WAIT sockets are optional.
> >>>>> If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ?
> >>>>>
> >>>>>> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But
> >>>>>> zap is required from privileged (CAP_NET_ADMIN) process. We are having
> >>>>>> hard time finding a privileged process to do this.
> >>>>> Really, we are not going to add kludges in TCP stacks because of this reason.
> >>>>>
> >>>>>> Thanks,
> >>>>>> Usama
> >>>>>>
> >>>>>>
> >>>>>> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> We have a set of processes which talk with each other through a local
> >>>>>>> TCP socket. If the process(es) are killed (through SIGKILL) and
> >>>>>>> restarted at once, the bind() fails with EADDRINUSE error. This error
> >>>>>>> only appears if application is restarted at once without waiting for 60
> >>>>>>> seconds or more. It seems that there is some timeout of 60 seconds for
> >>>>>>> which the previous TCP connection remains alive waiting to get closed
> >>>>>>> completely. In that duration if we try to connect again, we get the error.
> >>>>>>>
> >>>>>>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
> >>>>>>> socket in a hack. But this hack cannot be added to the application
> >>>>>>> process as we don't own it.
> >>>>>>>
> >>>>>>> I've looked at the TCP connection states after killing processes in
> >>>>>>> different ways. The TCP connection ends up in 2 different states with
> >>>>>>> timeouts:
> >>>>>>>
> >>>>>>> (1) Timeout associated with FIN_WAIT_1 state which is set through
> >>>>>>> `tcp_fin_timeout` in procfs (60 seconds by default)
> >>>>>>>
> >>>>>>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
> >>>>>>> seems like this timeout has come from RFC 1337.
> >>>>>>>
> >>>>>>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
> >>>>>>> also doesn't seem feasible to change the timeout of TIME_WAIT state as
> >>>>>>> the RFC mentions several hazards. But we are talking about a local TCP
> >>>>>>> connection where maybe those hazards aren't applicable directly? Is it
> >>>>>>> possible to change timeout for TIME_WAIT state for only local
> >>>>>>> connections without any hazards?
> >>>>>>>
> >>>>>>> We have tested a hack where we replace timeout of TIME_WAIT state from a
> >>>>>>> value in procfs for local connections. This solves our problem and
> >>>>>>> application starts to work without any modifications to it.
> >>>>>>>
> >>>>>>> The question is that what can be the best possible solution here? Any
> >>>>>>> thoughts will be very helpful.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>> --
> >>>>>> Muhammad Usama Anjum
>
>

2022-10-14 17:23:25

by Paul Gofman

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

Hello Eric,

    that message was not mine.

    Speaking from the Wine side, we cannot workaround that with
SO_REUSEADDR. First of all, it is under app control and we can't
voluntary tweak app's socket settings. Then, app might be intentionally
not using SO_REUSEADDR to prevent port reuse which of course may be
harmful (more harmful than failure to restart for another minute). What
is broken with the application which doesn't want to use SO_REUSEADDR
and wants to disallow port reuse while it binds to it which reuse will
surely break it?

    But my present question about the listening socket being not
reusable while closed due to linked accepeted socket was not related to
Wine at all. I am not sure how one can fix that in the application if
they don't really want other applications or another copy of the same
one to be able to reuse the port they currently bind to? I believe the
issue with listen socket been not available happens rather often for
native services and they all have to workaround that. While not related
here, I also encountered some out-of-tree hacks to tweak the TIME_WAIT
timeout to tackle this very problem for some cloud custom kernels.

    My question is if the behaviour of blocking listen socket port
while the accepted port (which, as I understand, does not have any
direct relation to listen port anymore from TCP standpoint) is still in
TIME_ or other wait is stipulated by TCP requirements which I am
missing? Or, if not, maybe that can be changed?

Thanks,
    Paul.


On 10/14/22 11:20, Eric Dumazet wrote:
> On Fri, Oct 14, 2022 at 8:52 AM Paul Gofman <[email protected]> wrote:
>> Hello Eric,
>>
>> our problem is actually not with the accept socket / port for which
>> those timeouts apply, we don't care for that temporary port number. The
>> problem is that the listen port (to which apps bind explicitly) is also
>> busy until the accept socket waits through all the necessary timeouts
>> and is fully closed. From my reading of TCP specs I don't understand why
>> it should be this way. The TCP hazards stipulating those timeouts seem
>> to apply to accept (connection) socket / port only. Shouldn't listen
>> socket's port (the only one we care about) be available for bind
>> immediately after the app stops listening on it (either due to closing
>> the listen socket or process force kill), or maybe have some other
>> timeouts not related to connected accept socket / port hazards? Or am I
>> missing something why it should be the way it is done now?
>>
>
> To quote your initial message :
>
> <quote>
> We are able to avoid this error by adding SO_REUSEADDR attribute to the
> socket in a hack. But this hack cannot be added to the application
> process as we don't own it.
> </quote>
>
> Essentially you are complaining of the linux kernel being unable to
> run a buggy application.
>
> We are not going to change the linux kernel because you can not
> fix/recompile an application.
>
> Note that you could use LD_PRELOAD, or maybe eBPF to automatically
> turn SO_REUSEADDR before bind()
>
>
>> Thanks,
>> Paul.
>>
>>
>> On 9/30/22 10:16, Eric Dumazet wrote:
>>> On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum
>>> <[email protected]> wrote:
>>>> Hi Eric,
>>>>
>>>> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because
>>>> of this hazard we have 60 seconds timeout in TIME_WAIT state if
>>>> connection isn't closed properly. From RFC 1337:
>>>>> The TIME-WAIT delay allows all old duplicate segments time
>>>> enough to die in the Internet before the connection is reopened.
>>>>
>>>> As on localhost there is virtually no delay. I think the TIME-WAIT delay
>>>> must be zero for localhost connections. I'm no expert here. On localhost
>>>> there is no delay. So why should we wait for 60 seconds to mitigate a
>>>> hazard which isn't there?
>>> Because we do not specialize TCP stack for loopback.
>>>
>>> It is easy to force delays even for loopback (tc qdisc add dev lo root
>>> netem ...)
>>>
>>> You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead.
>>>
>>> TIME_WAIT sockets are optional.
>>> If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ?
>>>
>>>> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But
>>>> zap is required from privileged (CAP_NET_ADMIN) process. We are having
>>>> hard time finding a privileged process to do this.
>>> Really, we are not going to add kludges in TCP stacks because of this reason.
>>>
>>>> Thanks,
>>>> Usama
>>>>
>>>>
>>>> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote:
>>>>> Hello,
>>>>>
>>>>> We have a set of processes which talk with each other through a local
>>>>> TCP socket. If the process(es) are killed (through SIGKILL) and
>>>>> restarted at once, the bind() fails with EADDRINUSE error. This error
>>>>> only appears if application is restarted at once without waiting for 60
>>>>> seconds or more. It seems that there is some timeout of 60 seconds for
>>>>> which the previous TCP connection remains alive waiting to get closed
>>>>> completely. In that duration if we try to connect again, we get the error.
>>>>>
>>>>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
>>>>> socket in a hack. But this hack cannot be added to the application
>>>>> process as we don't own it.
>>>>>
>>>>> I've looked at the TCP connection states after killing processes in
>>>>> different ways. The TCP connection ends up in 2 different states with
>>>>> timeouts:
>>>>>
>>>>> (1) Timeout associated with FIN_WAIT_1 state which is set through
>>>>> `tcp_fin_timeout` in procfs (60 seconds by default)
>>>>>
>>>>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
>>>>> seems like this timeout has come from RFC 1337.
>>>>>
>>>>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
>>>>> also doesn't seem feasible to change the timeout of TIME_WAIT state as
>>>>> the RFC mentions several hazards. But we are talking about a local TCP
>>>>> connection where maybe those hazards aren't applicable directly? Is it
>>>>> possible to change timeout for TIME_WAIT state for only local
>>>>> connections without any hazards?
>>>>>
>>>>> We have tested a hack where we replace timeout of TIME_WAIT state from a
>>>>> value in procfs for local connections. This solves our problem and
>>>>> application starts to work without any modifications to it.
>>>>>
>>>>> The question is that what can be the best possible solution here? Any
>>>>> thoughts will be very helpful.
>>>>>
>>>>> Regards,
>>>>>
>>>> --
>>>> Muhammad Usama Anjum
>>

2022-10-14 17:25:50

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

On Fri, Oct 14, 2022 at 9:31 AM Paul Gofman <[email protected]> wrote:
>
> Hello Eric,
>
> that message was not mine.
>
> Speaking from the Wine side, we cannot workaround that with
> SO_REUSEADDR. First of all, it is under app control and we can't
> voluntary tweak app's socket settings. Then, app might be intentionally
> not using SO_REUSEADDR to prevent port reuse which of course may be
> harmful (more harmful than failure to restart for another minute). What
> is broken with the application which doesn't want to use SO_REUSEADDR
> and wants to disallow port reuse while it binds to it which reuse will
> surely break it?
>
> But my present question about the listening socket being not
> reusable while closed due to linked accepeted socket was not related to
> Wine at all. I am not sure how one can fix that in the application if
> they don't really want other applications or another copy of the same
> one to be able to reuse the port they currently bind to? I believe the
> issue with listen socket been not available happens rather often for
> native services and they all have to workaround that. While not related
> here, I also encountered some out-of-tree hacks to tweak the TIME_WAIT
> timeout to tackle this very problem for some cloud custom kernels.
>
> My question is if the behaviour of blocking listen socket port
> while the accepted port (which, as I understand, does not have any
> direct relation to listen port anymore from TCP standpoint) is still in
> TIME_ or other wait is stipulated by TCP requirements which I am
> missing? Or, if not, maybe that can be changed?
>

Please raise these questions at IETF, this is where major TCP changes
need to be approved.

There are multiple ways to avoid TIME_WAIT, if you really need to.


> Thanks,
> Paul.
>
>
> On 10/14/22 11:20, Eric Dumazet wrote:
> > On Fri, Oct 14, 2022 at 8:52 AM Paul Gofman <[email protected]> wrote:
> >> Hello Eric,
> >>
> >> our problem is actually not with the accept socket / port for which
> >> those timeouts apply, we don't care for that temporary port number. The
> >> problem is that the listen port (to which apps bind explicitly) is also
> >> busy until the accept socket waits through all the necessary timeouts
> >> and is fully closed. From my reading of TCP specs I don't understand why
> >> it should be this way. The TCP hazards stipulating those timeouts seem
> >> to apply to accept (connection) socket / port only. Shouldn't listen
> >> socket's port (the only one we care about) be available for bind
> >> immediately after the app stops listening on it (either due to closing
> >> the listen socket or process force kill), or maybe have some other
> >> timeouts not related to connected accept socket / port hazards? Or am I
> >> missing something why it should be the way it is done now?
> >>
> >
> > To quote your initial message :
> >
> > <quote>
> > We are able to avoid this error by adding SO_REUSEADDR attribute to the
> > socket in a hack. But this hack cannot be added to the application
> > process as we don't own it.
> > </quote>
> >
> > Essentially you are complaining of the linux kernel being unable to
> > run a buggy application.
> >
> > We are not going to change the linux kernel because you can not
> > fix/recompile an application.
> >
> > Note that you could use LD_PRELOAD, or maybe eBPF to automatically
> > turn SO_REUSEADDR before bind()
> >
> >
> >> Thanks,
> >> Paul.
> >>
> >>
> >> On 9/30/22 10:16, Eric Dumazet wrote:
> >>> On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum
> >>> <[email protected]> wrote:
> >>>> Hi Eric,
> >>>>
> >>>> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because
> >>>> of this hazard we have 60 seconds timeout in TIME_WAIT state if
> >>>> connection isn't closed properly. From RFC 1337:
> >>>>> The TIME-WAIT delay allows all old duplicate segments time
> >>>> enough to die in the Internet before the connection is reopened.
> >>>>
> >>>> As on localhost there is virtually no delay. I think the TIME-WAIT delay
> >>>> must be zero for localhost connections. I'm no expert here. On localhost
> >>>> there is no delay. So why should we wait for 60 seconds to mitigate a
> >>>> hazard which isn't there?
> >>> Because we do not specialize TCP stack for loopback.
> >>>
> >>> It is easy to force delays even for loopback (tc qdisc add dev lo root
> >>> netem ...)
> >>>
> >>> You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead.
> >>>
> >>> TIME_WAIT sockets are optional.
> >>> If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ?
> >>>
> >>>> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But
> >>>> zap is required from privileged (CAP_NET_ADMIN) process. We are having
> >>>> hard time finding a privileged process to do this.
> >>> Really, we are not going to add kludges in TCP stacks because of this reason.
> >>>
> >>>> Thanks,
> >>>> Usama
> >>>>
> >>>>
> >>>> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote:
> >>>>> Hello,
> >>>>>
> >>>>> We have a set of processes which talk with each other through a local
> >>>>> TCP socket. If the process(es) are killed (through SIGKILL) and
> >>>>> restarted at once, the bind() fails with EADDRINUSE error. This error
> >>>>> only appears if application is restarted at once without waiting for 60
> >>>>> seconds or more. It seems that there is some timeout of 60 seconds for
> >>>>> which the previous TCP connection remains alive waiting to get closed
> >>>>> completely. In that duration if we try to connect again, we get the error.
> >>>>>
> >>>>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
> >>>>> socket in a hack. But this hack cannot be added to the application
> >>>>> process as we don't own it.
> >>>>>
> >>>>> I've looked at the TCP connection states after killing processes in
> >>>>> different ways. The TCP connection ends up in 2 different states with
> >>>>> timeouts:
> >>>>>
> >>>>> (1) Timeout associated with FIN_WAIT_1 state which is set through
> >>>>> `tcp_fin_timeout` in procfs (60 seconds by default)
> >>>>>
> >>>>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
> >>>>> seems like this timeout has come from RFC 1337.
> >>>>>
> >>>>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
> >>>>> also doesn't seem feasible to change the timeout of TIME_WAIT state as
> >>>>> the RFC mentions several hazards. But we are talking about a local TCP
> >>>>> connection where maybe those hazards aren't applicable directly? Is it
> >>>>> possible to change timeout for TIME_WAIT state for only local
> >>>>> connections without any hazards?
> >>>>>
> >>>>> We have tested a hack where we replace timeout of TIME_WAIT state from a
> >>>>> value in procfs for local connections. This solves our problem and
> >>>>> application starts to work without any modifications to it.
> >>>>>
> >>>>> The question is that what can be the best possible solution here? Any
> >>>>> thoughts will be very helpful.
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>> --
> >>>> Muhammad Usama Anjum
> >>
>

2022-10-14 17:26:09

by Paul Gofman

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

Sorry if I was unclear, to reformulate my question, is blocking
listening port (not the accept one) this way a IETF requirement? I am
asking because I could not find where such a requirement stems from
there. Sorry if I am missing the obvious.

On 10/14/22 11:34, Eric Dumazet wrote:
>> My question is if the behaviour of blocking listen socket port
>> while the accepted port (which, as I understand, does not have any
>> direct relation to listen port anymore from TCP standpoint) is still in
>> TIME_ or other wait is stipulated by TCP requirements which I am
>> missing? Or, if not, maybe that can be changed?
>>
> Please raise these questions at IETF, this is where major TCP changes
> need to be approved.
>
> There are multiple ways to avoid TIME_WAIT, if you really need to.
>
>
>> Thanks,
>> Paul.
>>
>>
>> On 10/14/22 11:20, Eric Dumazet wrote:
>>> On Fri, Oct 14, 2022 at 8:52 AM Paul Gofman <[email protected]> wrote:
>>>> Hello Eric,
>>>>
>>>> our problem is actually not with the accept socket / port for which
>>>> those timeouts apply, we don't care for that temporary port number. The
>>>> problem is that the listen port (to which apps bind explicitly) is also
>>>> busy until the accept socket waits through all the necessary timeouts
>>>> and is fully closed. From my reading of TCP specs I don't understand why
>>>> it should be this way. The TCP hazards stipulating those timeouts seem
>>>> to apply to accept (connection) socket / port only. Shouldn't listen
>>>> socket's port (the only one we care about) be available for bind
>>>> immediately after the app stops listening on it (either due to closing
>>>> the listen socket or process force kill), or maybe have some other
>>>> timeouts not related to connected accept socket / port hazards? Or am I
>>>> missing something why it should be the way it is done now?
>>>>
>>> To quote your initial message :
>>>
>>> <quote>
>>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
>>> socket in a hack. But this hack cannot be added to the application
>>> process as we don't own it.
>>> </quote>
>>>
>>> Essentially you are complaining of the linux kernel being unable to
>>> run a buggy application.
>>>
>>> We are not going to change the linux kernel because you can not
>>> fix/recompile an application.
>>>
>>> Note that you could use LD_PRELOAD, or maybe eBPF to automatically
>>> turn SO_REUSEADDR before bind()
>>>
>>>
>>>> Thanks,
>>>> Paul.
>>>>
>>>>
>>>> On 9/30/22 10:16, Eric Dumazet wrote:
>>>>> On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum
>>>>> <[email protected]> wrote:
>>>>>> Hi Eric,
>>>>>>
>>>>>> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because
>>>>>> of this hazard we have 60 seconds timeout in TIME_WAIT state if
>>>>>> connection isn't closed properly. From RFC 1337:
>>>>>>> The TIME-WAIT delay allows all old duplicate segments time
>>>>>> enough to die in the Internet before the connection is reopened.
>>>>>>
>>>>>> As on localhost there is virtually no delay. I think the TIME-WAIT delay
>>>>>> must be zero for localhost connections. I'm no expert here. On localhost
>>>>>> there is no delay. So why should we wait for 60 seconds to mitigate a
>>>>>> hazard which isn't there?
>>>>> Because we do not specialize TCP stack for loopback.
>>>>>
>>>>> It is easy to force delays even for loopback (tc qdisc add dev lo root
>>>>> netem ...)
>>>>>
>>>>> You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead.
>>>>>
>>>>> TIME_WAIT sockets are optional.
>>>>> If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ?
>>>>>
>>>>>> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But
>>>>>> zap is required from privileged (CAP_NET_ADMIN) process. We are having
>>>>>> hard time finding a privileged process to do this.
>>>>> Really, we are not going to add kludges in TCP stacks because of this reason.
>>>>>
>>>>>> Thanks,
>>>>>> Usama
>>>>>>
>>>>>>
>>>>>> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> We have a set of processes which talk with each other through a local
>>>>>>> TCP socket. If the process(es) are killed (through SIGKILL) and
>>>>>>> restarted at once, the bind() fails with EADDRINUSE error. This error
>>>>>>> only appears if application is restarted at once without waiting for 60
>>>>>>> seconds or more. It seems that there is some timeout of 60 seconds for
>>>>>>> which the previous TCP connection remains alive waiting to get closed
>>>>>>> completely. In that duration if we try to connect again, we get the error.
>>>>>>>
>>>>>>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
>>>>>>> socket in a hack. But this hack cannot be added to the application
>>>>>>> process as we don't own it.
>>>>>>>
>>>>>>> I've looked at the TCP connection states after killing processes in
>>>>>>> different ways. The TCP connection ends up in 2 different states with
>>>>>>> timeouts:
>>>>>>>
>>>>>>> (1) Timeout associated with FIN_WAIT_1 state which is set through
>>>>>>> `tcp_fin_timeout` in procfs (60 seconds by default)
>>>>>>>
>>>>>>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
>>>>>>> seems like this timeout has come from RFC 1337.
>>>>>>>
>>>>>>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
>>>>>>> also doesn't seem feasible to change the timeout of TIME_WAIT state as
>>>>>>> the RFC mentions several hazards. But we are talking about a local TCP
>>>>>>> connection where maybe those hazards aren't applicable directly? Is it
>>>>>>> possible to change timeout for TIME_WAIT state for only local
>>>>>>> connections without any hazards?
>>>>>>>
>>>>>>> We have tested a hack where we replace timeout of TIME_WAIT state from a
>>>>>>> value in procfs for local connections. This solves our problem and
>>>>>>> application starts to work without any modifications to it.
>>>>>>>
>>>>>>> The question is that what can be the best possible solution here? Any
>>>>>>> thoughts will be very helpful.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>> --
>>>>>> Muhammad Usama Anjum


2022-10-14 17:27:46

by Paul Gofman

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

Hello Eric,

our problem is actually not with the accept socket / port for which
those timeouts apply, we don't care for that temporary port number. The
problem is that the listen port (to which apps bind explicitly) is also
busy until the accept socket waits through all the necessary timeouts
and is fully closed. From my reading of TCP specs I don't understand why
it should be this way. The TCP hazards stipulating those timeouts seem
to apply to accept (connection) socket / port only. Shouldn't listen
socket's port (the only one we care about) be available for bind
immediately after the app stops listening on it (either due to closing
the listen socket or process force kill), or maybe have some other
timeouts not related to connected accept socket / port hazards? Or am I
missing something why it should be the way it is done now?

Thanks,
    Paul.


On 9/30/22 10:16, Eric Dumazet wrote:
> On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum
> <[email protected]> wrote:
>> Hi Eric,
>>
>> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because
>> of this hazard we have 60 seconds timeout in TIME_WAIT state if
>> connection isn't closed properly. From RFC 1337:
>>> The TIME-WAIT delay allows all old duplicate segments time
>> enough to die in the Internet before the connection is reopened.
>>
>> As on localhost there is virtually no delay. I think the TIME-WAIT delay
>> must be zero for localhost connections. I'm no expert here. On localhost
>> there is no delay. So why should we wait for 60 seconds to mitigate a
>> hazard which isn't there?
> Because we do not specialize TCP stack for loopback.
>
> It is easy to force delays even for loopback (tc qdisc add dev lo root
> netem ...)
>
> You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead.
>
> TIME_WAIT sockets are optional.
> If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ?
>
>> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But
>> zap is required from privileged (CAP_NET_ADMIN) process. We are having
>> hard time finding a privileged process to do this.
> Really, we are not going to add kludges in TCP stacks because of this reason.
>
>> Thanks,
>> Usama
>>
>>
>> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote:
>>> Hello,
>>>
>>> We have a set of processes which talk with each other through a local
>>> TCP socket. If the process(es) are killed (through SIGKILL) and
>>> restarted at once, the bind() fails with EADDRINUSE error. This error
>>> only appears if application is restarted at once without waiting for 60
>>> seconds or more. It seems that there is some timeout of 60 seconds for
>>> which the previous TCP connection remains alive waiting to get closed
>>> completely. In that duration if we try to connect again, we get the error.
>>>
>>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
>>> socket in a hack. But this hack cannot be added to the application
>>> process as we don't own it.
>>>
>>> I've looked at the TCP connection states after killing processes in
>>> different ways. The TCP connection ends up in 2 different states with
>>> timeouts:
>>>
>>> (1) Timeout associated with FIN_WAIT_1 state which is set through
>>> `tcp_fin_timeout` in procfs (60 seconds by default)
>>>
>>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
>>> seems like this timeout has come from RFC 1337.
>>>
>>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
>>> also doesn't seem feasible to change the timeout of TIME_WAIT state as
>>> the RFC mentions several hazards. But we are talking about a local TCP
>>> connection where maybe those hazards aren't applicable directly? Is it
>>> possible to change timeout for TIME_WAIT state for only local
>>> connections without any hazards?
>>>
>>> We have tested a hack where we replace timeout of TIME_WAIT state from a
>>> value in procfs for local connections. This solves our problem and
>>> application starts to work without any modifications to it.
>>>
>>> The question is that what can be the best possible solution here? Any
>>> thoughts will be very helpful.
>>>
>>> Regards,
>>>
>> --
>> Muhammad Usama Anjum


2022-10-14 17:42:04

by Paul Gofman

[permalink] [raw]
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing

On 10/14/22 11:45, Eric Dumazet wrote:
> On Fri, Oct 14, 2022 at 9:39 AM Paul Gofman <[email protected]> wrote:
>
> I think it is documented.
>
> man 7 socket
>
> SO_REUSEADDR
> Indicates that the rules used in validating addresses
> supplied in a bind(2) call should allow reuse of local addresses. For
> AF_INET sockets this means
> that a socket may bind, except when there is an active
> listening socket bound to the address. When the listening socket is
> bound to INADDR_ANY with a
> specific port then it is not possible to bind to this
> port for any local address. Argument is an integer boolean flag.
>
> You seem to need another way, so you will have to ask this question in IETF.
Thanks a lot, I think it answers my question, I am afraid I was reading
this a bit wrong.