Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp3659129ioo; Wed, 25 May 2022 05:36:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy2rCfgLA+5BmXxIsdxXIiKtPRz+LBYA3WXDz+p/1cM1m9f+F7Jpy11M6PtZZBR8Pw1lg2D X-Received: by 2002:a63:d40f:0:b0:3fa:247c:f07d with SMTP id a15-20020a63d40f000000b003fa247cf07dmr15178850pgh.459.1653482193903; Wed, 25 May 2022 05:36:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653482193; cv=none; d=google.com; s=arc-20160816; b=fItjOyMycZ3bP9jQbRpQpNm9LCMzGRxA2dL6vDs8FwLTFs3arVZSfpOKoTPLuDjXmj tfZQ6yHaghHhbw5R9h2DSboZgRkq/jJXZAmtjF37tR9/VrpRw7YiKHNVbN1rWH29WwX8 x8lFaO7YtyH9w4QqE//igEZMk+ihG4VHW9ZyDfXjKpgOUB/LfSIGp6GXxUhz8OuTSsQp ThcWIuXdd9JSx6KSCH4ds1JbQVTGGAbLQw0GehRq1+NxpFDb1Gei3qiWlR3sQyzZctSJ CcBM7E4Qw3Q+YZJo+FVHY5Vh0JwlAlvoAi2JYmNCLphYgVwdLsFWy1LGfXWzvR/Gtf3r Mwtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=KHIJKfxgTwjjvyeenG8PqcpEQiKi1RaY8gq052ZcCvA=; b=DCGJSfisd81ppNgLJUpQbcxZxDgcguk8Nkv+IlOqlI2mUSqPjDJY8mbUbIzJywAVIq Ksgr6pIeazFq9XknQw82NFMWKIEQ7cVkUDn/XeLvJHyVuP1mCiwYsBpkzd0c/ilVDVSN i7FTOd4fcXXWOtanaM1wyAzR5PKLVHgLlsqrgFpVCs0N2iMbZoKRygP1fLN0s6vLx6UB iu/TDfTIgUdtPo1tXPtZiRTaQhYZNDTaQRT08xqb873DUpx8OV8vD0ZWWyGf0Zt2e+fd IxT2fAyLVtuE/JYnpCid3hG2IVLZyRQp0g0m3L86KUUOIfDnRhE9vO7JaFQ8ov7I1Cq2 F24g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=T6hMuCVE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j5-20020a170902690500b0015f3d889431si16632137plk.446.2022.05.25.05.36.18; Wed, 25 May 2022 05:36:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=T6hMuCVE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238914AbiEXWNo (ORCPT + 99 others); Tue, 24 May 2022 18:13:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238106AbiEXWNn (ORCPT ); Tue, 24 May 2022 18:13:43 -0400 Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D9D95DD26 for ; Tue, 24 May 2022 15:13:42 -0700 (PDT) Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-2ff7b90e635so130966617b3.5 for ; Tue, 24 May 2022 15:13:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KHIJKfxgTwjjvyeenG8PqcpEQiKi1RaY8gq052ZcCvA=; b=T6hMuCVEX+qbOyFR89d+IsjkmG2gLaJBxFErHe2gyjpChqhEV06irPnesumPglEbfQ 9RzraS+IIv/IhS0qu2Eb0eQBfqdVIF0RlpIoeWoDmLfii1Sv2YC4SF3LbnRSukQZ9qzz pJeqxph4Qi7c2sNv99au5tIQWl0ozkKlnmd0V0WGFX7sQ3dOwTkeZMU1khTXPmPQHJUl tZRACg67zb7w/EkNZGLYl1lNsBDGe92KUIKcc3T5erYvU9J/dpKPz0/i2XHRBRlFK4A3 /sJCr+ROTjxFyDkrFfp0KBpEqKnWDuU4RBCUt/jU889SHRV3NrZrFAXMFOer49lqOH/N eGBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KHIJKfxgTwjjvyeenG8PqcpEQiKi1RaY8gq052ZcCvA=; b=TcWw3X1qVsAogeWCj2Rp8Mgcfqj1jPez4RfIp5Kop98gceUzDREZ/bJ0PVjymO2S2+ ppGUpBrH5CZtL8lPGPWXcR/Dnj4OE5VAgy+Wz5PXqTYXjA3Ts1aTboCJ0PSgJYR93ujy d1R72ugebASgfAMqfYlx+AAzxTcRQJGx1ppbLFDWApqWtTDFZZ4WLEBH1JK9THPj9O5E M8Nb8XRocENU9/AonNgG8DPYFyYuvwQMQlRjWtNfUmyGmjKCj/z7IjVqYwsNZGNi5O0C GgyxWWtxcudZt/+LLJ//eqkgM1UaSvtdICdpHsfozn8qFC2+4gCCSY5c9sUbSKpFguMo hbww== X-Gm-Message-State: AOAM53139uu47R40MM4EZjnx5TCLHiWx0L9FsssWP7T72gkPx3vUkQXg bH5M2z9dMfc4cssVkDMsM6uo/0rbkXJeAzGYhk3VBw== X-Received: by 2002:a81:b401:0:b0:300:2e86:e7e5 with SMTP id h1-20020a81b401000000b003002e86e7e5mr4631478ywi.467.1653430421051; Tue, 24 May 2022 15:13:41 -0700 (PDT) MIME-Version: 1.0 References: <5099dc39-c6d9-115a-855b-6aa98d17eb4b@collabora.com> In-Reply-To: <5099dc39-c6d9-115a-855b-6aa98d17eb4b@collabora.com> From: Eric Dumazet Date: Tue, 24 May 2022 15:13:29 -0700 Message-ID: Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing To: Muhammad Usama Anjum Cc: "David S. Miller" , Hideaki YOSHIFUJI , David Ahern , Jakub Kicinski , Paolo Abeni , "open list:NETWORKING [TCP]" , Gabriel Krisman Bertazi , open list Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 24, 2022 at 1:19 AM Muhammad Usama Anjum wrote: > > Hello, > > We have a set of processes which talk with each other through a local > TCP socket. If the process(es) are killed (through SIGKILL) and > restarted at once, the bind() fails with EADDRINUSE error. This error > only appears if application is restarted at once without waiting for 60 > seconds or more. It seems that there is some timeout of 60 seconds for > which the previous TCP connection remains alive waiting to get closed > completely. In that duration if we try to connect again, we get the error. > > We are able to avoid this error by adding SO_REUSEADDR attribute to the > socket in a hack. But this hack cannot be added to the application > process as we don't own it. > > I've looked at the TCP connection states after killing processes in > different ways. The TCP connection ends up in 2 different states with > timeouts: > > (1) Timeout associated with FIN_WAIT_1 state which is set through > `tcp_fin_timeout` in procfs (60 seconds by default) > > (2) Timeout associated with TIME_WAIT state which cannot be changed. It > seems like this timeout has come from RFC 1337. > > The timeout in (1) can be changed. Timeout in (2) cannot be changed. It > also doesn't seem feasible to change the timeout of TIME_WAIT state as > the RFC mentions several hazards. But we are talking about a local TCP > connection where maybe those hazards aren't applicable directly? Is it > possible to change timeout for TIME_WAIT state for only local > connections without any hazards? > > We have tested a hack where we replace timeout of TIME_WAIT state from a > value in procfs for local connections. This solves our problem and > application starts to work without any modifications to it. > > The question is that what can be the best possible solution here? Any > thoughts will be very helpful. > One solution would be to extend TCP diag to support killing TIME_WAIT sockets. (This has been raised recently anyway) Then you could zap all sockets, before re-starting your program. ss -K -ta src :listen_port Untested patch: diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 9984d23a7f3e1353d2e1fc9053d98c77268c577e..1b7bde889096aa800b2994c64a3a68edf3b62434 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4519,6 +4519,15 @@ int tcp_abort(struct sock *sk, int err) local_bh_enable(); return 0; } + if (sk->sk_state == TCP_TIME_WAIT) { + struct inet_timewait_sock *tw = inet_twsk(sk); + + refcount_inc(&tw->tw_refcnt); + local_bh_disable(); + inet_twsk_deschedule_put(tw); + local_bh_enable(); + return 0; + } return -EOPNOTSUPP; }