Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp934881rwi; Fri, 14 Oct 2022 10:26:09 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4gql7n7yyEKKEk5nPOc51XKqiXXrl+uXE0G12rLvPkyEPd2uQ9OTrlddR4vIUSd5qpNuN9 X-Received: by 2002:a17:902:e841:b0:180:49a2:8e6c with SMTP id t1-20020a170902e84100b0018049a28e6cmr6166850plg.143.1665768369620; Fri, 14 Oct 2022 10:26:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665768369; cv=none; d=google.com; s=arc-20160816; b=s2RwVqGOv/mH4Hp20fZWy5vp4dhSfC57YjIg3bozmN/RaHzFndY/XVFrd+5PugT7Gd ZBGtiZVbjvpze4gPP5NzTjsVWyY4bdggtgCom0ZGunc6nCvPdcHlFIt5CB054KsZ2WyM 1ByQa9Gai1ty94hwDD0/ZoB2caKXvkJwSZxUmwAteygUQG5WzmtWWL8iBD1auW/kytgS nMkbNffHkyFa16QIoXNn/vgLiUv1OKhbhkwZe3DRwh5gybb/qRSlqmJFCwCYOSuUHAyZ uI/LnYZxUSKqmbx+7VRcEjjk+ubZ1A4Xtm0mkOo5GLtOcz78HscoG7dY0ji3/Cugdyyb TX6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=6eMvLtK9mwBum1HsBSBhJgZIUT/9Re27IqkGuogHqoU=; b=ric5SMy51Pz1rziXUvBsYo3byOIf848JewzPdO70D30P2Tc0BbUHDRT9UYJ5zXutLb k9nKWI+PH2RDk9z/rm9iNfQgmHeR0CokXt2GNikQPtQIuEHsijFYEhN5OiL9+gdV/499 if/c6sO7Ps9/MIbU4hW4zlchuPXwZ3uS46W/etP9fDguLgX2py+YbTttx7DnkAXS/xkv m/dG9tEVWSt73+G+w/UDOG+swpW9TgbSoq0VTGRWC1PjrB7OOVQxAM77yrB3nQMQ7Nt8 l3GUj+AOzwIxlla2AzYB/0gcnliZG2U2Bw9mvGGXl8eoaEekJXQzfxKUjdLWbvlJQzj7 KdkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@codeweavers.com header.s=6377696661 header.b=uZ2rbG5P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=codeweavers.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c6-20020a170902b68600b00174418919a6si3168748pls.513.2022.10.14.10.25.52; Fri, 14 Oct 2022 10:26:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@codeweavers.com header.s=6377696661 header.b=uZ2rbG5P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=codeweavers.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230063AbiJNQjh (ORCPT + 99 others); Fri, 14 Oct 2022 12:39:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229968AbiJNQjd (ORCPT ); Fri, 14 Oct 2022 12:39:33 -0400 Received: from mail.codeweavers.com (mail.codeweavers.com [65.103.31.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53C61114DFC; Fri, 14 Oct 2022 09:39:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=codeweavers.com; s=6377696661; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=6eMvLtK9mwBum1HsBSBhJgZIUT/9Re27IqkGuogHqoU=; b=uZ2rbG5PgScHTGpr1HiOUsLtct AfoDPhPemSkQyE2Av42ziro+iH44do4pxhX4u5nCjssYtcfFOJD6tjpRH2ZDd9b0xYMNUsAKakdr1 cbWdYEFX+L6x56wsiaFUiXdwlJBNo+eYXqbkNFdiDOXrzMJKDiPYCvWn5HO7i8vwHPxE=; Received: from cw141ip123.vpn.codeweavers.com ([10.69.141.123]) by mail.codeweavers.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1ojNic-00G8ri-Pf; Fri, 14 Oct 2022 11:39:31 -0500 Message-ID: <342a762d-22f5-b979-411f-aab0474feda2@codeweavers.com> Date: Fri, 14 Oct 2022 11:39:29 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing Content-Language: en-GB To: Eric Dumazet Cc: Muhammad Usama Anjum , "open list:NETWORKING [TCP]" , LKML , "David S. Miller" , Hideaki YOSHIFUJI , David Ahern , Paolo Abeni , Jakub Kicinski References: <5099dc39-c6d9-115a-855b-6aa98d17eb4b@collabora.com> <8dff3e46-6dac-af6a-1a3b-e6a8b93fdc60@collabora.com> <5db967de-ea7e-9f35-cd74-d4cca2fcb9ee@codeweavers.com> <81b0e6c9-6c13-aecd-1e0e-6417eb89285f@codeweavers.com> From: Paul Gofman In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,SPF_HELO_PASS, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sorry if I was unclear, to reformulate my question, is blocking listening port (not the accept one) this way a IETF requirement? I am asking because I could not find where such a requirement stems from there. Sorry if I am missing the obvious. On 10/14/22 11:34, Eric Dumazet wrote: >> My question is if the behaviour of blocking listen socket port >> while the accepted port (which, as I understand, does not have any >> direct relation to listen port anymore from TCP standpoint) is still in >> TIME_ or other wait is stipulated by TCP requirements which I am >> missing? Or, if not, maybe that can be changed? >> > Please raise these questions at IETF, this is where major TCP changes > need to be approved. > > There are multiple ways to avoid TIME_WAIT, if you really need to. > > >> Thanks, >> Paul. >> >> >> On 10/14/22 11:20, Eric Dumazet wrote: >>> On Fri, Oct 14, 2022 at 8:52 AM Paul Gofman wrote: >>>> Hello Eric, >>>> >>>> our problem is actually not with the accept socket / port for which >>>> those timeouts apply, we don't care for that temporary port number. The >>>> problem is that the listen port (to which apps bind explicitly) is also >>>> busy until the accept socket waits through all the necessary timeouts >>>> and is fully closed. From my reading of TCP specs I don't understand why >>>> it should be this way. The TCP hazards stipulating those timeouts seem >>>> to apply to accept (connection) socket / port only. Shouldn't listen >>>> socket's port (the only one we care about) be available for bind >>>> immediately after the app stops listening on it (either due to closing >>>> the listen socket or process force kill), or maybe have some other >>>> timeouts not related to connected accept socket / port hazards? Or am I >>>> missing something why it should be the way it is done now? >>>> >>> To quote your initial message : >>> >>> >>> We are able to avoid this error by adding SO_REUSEADDR attribute to the >>> socket in a hack. But this hack cannot be added to the application >>> process as we don't own it. >>> >>> >>> Essentially you are complaining of the linux kernel being unable to >>> run a buggy application. >>> >>> We are not going to change the linux kernel because you can not >>> fix/recompile an application. >>> >>> Note that you could use LD_PRELOAD, or maybe eBPF to automatically >>> turn SO_REUSEADDR before bind() >>> >>> >>>> Thanks, >>>> Paul. >>>> >>>> >>>> On 9/30/22 10:16, Eric Dumazet wrote: >>>>> On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum >>>>> wrote: >>>>>> Hi Eric, >>>>>> >>>>>> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because >>>>>> of this hazard we have 60 seconds timeout in TIME_WAIT state if >>>>>> connection isn't closed properly. From RFC 1337: >>>>>>> The TIME-WAIT delay allows all old duplicate segments time >>>>>> enough to die in the Internet before the connection is reopened. >>>>>> >>>>>> As on localhost there is virtually no delay. I think the TIME-WAIT delay >>>>>> must be zero for localhost connections. I'm no expert here. On localhost >>>>>> there is no delay. So why should we wait for 60 seconds to mitigate a >>>>>> hazard which isn't there? >>>>> Because we do not specialize TCP stack for loopback. >>>>> >>>>> It is easy to force delays even for loopback (tc qdisc add dev lo root >>>>> netem ...) >>>>> >>>>> You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead. >>>>> >>>>> TIME_WAIT sockets are optional. >>>>> If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ? >>>>> >>>>>> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But >>>>>> zap is required from privileged (CAP_NET_ADMIN) process. We are having >>>>>> hard time finding a privileged process to do this. >>>>> Really, we are not going to add kludges in TCP stacks because of this reason. >>>>> >>>>>> Thanks, >>>>>> Usama >>>>>> >>>>>> >>>>>> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote: >>>>>>> Hello, >>>>>>> >>>>>>> We have a set of processes which talk with each other through a local >>>>>>> TCP socket. If the process(es) are killed (through SIGKILL) and >>>>>>> restarted at once, the bind() fails with EADDRINUSE error. This error >>>>>>> only appears if application is restarted at once without waiting for 60 >>>>>>> seconds or more. It seems that there is some timeout of 60 seconds for >>>>>>> which the previous TCP connection remains alive waiting to get closed >>>>>>> completely. In that duration if we try to connect again, we get the error. >>>>>>> >>>>>>> We are able to avoid this error by adding SO_REUSEADDR attribute to the >>>>>>> socket in a hack. But this hack cannot be added to the application >>>>>>> process as we don't own it. >>>>>>> >>>>>>> I've looked at the TCP connection states after killing processes in >>>>>>> different ways. The TCP connection ends up in 2 different states with >>>>>>> timeouts: >>>>>>> >>>>>>> (1) Timeout associated with FIN_WAIT_1 state which is set through >>>>>>> `tcp_fin_timeout` in procfs (60 seconds by default) >>>>>>> >>>>>>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It >>>>>>> seems like this timeout has come from RFC 1337. >>>>>>> >>>>>>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It >>>>>>> also doesn't seem feasible to change the timeout of TIME_WAIT state as >>>>>>> the RFC mentions several hazards. But we are talking about a local TCP >>>>>>> connection where maybe those hazards aren't applicable directly? Is it >>>>>>> possible to change timeout for TIME_WAIT state for only local >>>>>>> connections without any hazards? >>>>>>> >>>>>>> We have tested a hack where we replace timeout of TIME_WAIT state from a >>>>>>> value in procfs for local connections. This solves our problem and >>>>>>> application starts to work without any modifications to it. >>>>>>> >>>>>>> The question is that what can be the best possible solution here? Any >>>>>>> thoughts will be very helpful. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>> -- >>>>>> Muhammad Usama Anjum