Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp934536rwi; Fri, 14 Oct 2022 10:25:50 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5xa/ptglCel1Nxn+vq9piAsmO+ZyPUKxJkqSztTtv9cpqnO2KqI5LNPa9PT0NIWlJMuzsx X-Received: by 2002:a17:90b:2751:b0:20a:e437:a9e8 with SMTP id qi17-20020a17090b275100b0020ae437a9e8mr18079243pjb.181.1665768350270; Fri, 14 Oct 2022 10:25:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665768350; cv=none; d=google.com; s=arc-20160816; b=ReTLTr7d/9How7oqGu1+xxy5ZjHYtAq+fr/u+/HgVLPB5pUcqNEZm8QeMBrUEdVxsf stuoAe9yhLI+4iyubNrH78fvr274eJQJBk7FYfylv/NSi8AxWP1ayoMUbPsx0iAvGA8y Gpkevz3eMa/ZOyOyxGdVzuvqy6leb96hX80z5/FwEO3goHAYEK+tyIlhxDdOOmomgpW8 SXzDg3QXi1XxRFO8vmLk9Wmc05nxiiukRCogRxQnRWLxPXSwdNyqUnHDjP2qb1axsO8Y LJ7c/mYhiKggUsW3ey3TFhkm8ipN3/+cTQ6nNDtJhV6+i7N5ZENrCnvL57eewXcdkQ5W DmJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=LHCsWtaFEYAi93gOnkOLAsNfgraMkc8s0asScvbJ6cY=; b=mo03CvbqXv+t/+/LZ1nDnjpL6VKJwHclgcoqgmDCQVOb5fAm8Aa40Sb8MAzJeg6/8e O6WsoM+cNSu8g+1GPW4S0Jiop5zp8JN6V1+YoVJzQVCtpkd1Ae4GYT7GQ4CIo7S0ct3p D+co/OJJjusU2QxMVRxi8sVAkcz4Okgh+YCbvW9DPBfMPlO8zntcCz3aTHcVoG8Nm98n 2Fm3qLEqzEhQgA4vmsN83Mbw9k4cBbB+Q738kAsC2MysPPCfCNxOh6SRo+/XSNm+4n/Y RLDIdGYI6rh8mkBfXhuBeAWQhV4pZW2N9rZ4IwT27fS0K2s3Mcn9Au3pPvQXcBUm3gJ6 cNqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=C9Xd3zIO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i4-20020a63b304000000b004468ff8fd57si3355358pgf.680.2022.10.14.10.25.35; Fri, 14 Oct 2022 10:25:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=C9Xd3zIO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231147AbiJNQfK (ORCPT + 99 others); Fri, 14 Oct 2022 12:35:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231146AbiJNQfF (ORCPT ); Fri, 14 Oct 2022 12:35:05 -0400 Received: from mail-yb1-xb32.google.com (mail-yb1-xb32.google.com [IPv6:2607:f8b0:4864:20::b32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0EB8F2BE05 for ; Fri, 14 Oct 2022 09:35:01 -0700 (PDT) Received: by mail-yb1-xb32.google.com with SMTP id b145so6275278yba.0 for ; Fri, 14 Oct 2022 09:35:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=LHCsWtaFEYAi93gOnkOLAsNfgraMkc8s0asScvbJ6cY=; b=C9Xd3zIORVSXFdAzIfAjQxokqqmcpA4caJnK2HJJZM3HVwxdIfGBi15HlgMSQzMKzH nxLLdLV2wOIdV/pALDzUfNXG/BM0IYANItQ8UBnkdRopt0dgV3bO36X7LMnp+Km2eiy8 DwaUgUGN9ioMJKhUO+xf9kg7L1cydmVdqJSG3MfaFT8CMUJ3ugASFivmTX44H2TNpx9U 6bQ8vxSLLOI/7iA36DNIM1XK5wTsu3tPpbntYTNVZvO+vnBecBpSzr0YWOI1t59X9em6 pK1ud7AgwOSB5SKRtrjIjd3UM0+SHYvglmqRQEejS8wgsIMZ0fJ1u+J5tGW0T8MfPX0s 1AGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LHCsWtaFEYAi93gOnkOLAsNfgraMkc8s0asScvbJ6cY=; b=hHzFTaL20TqTx38PIBOOqPxmKSSegWtO2iKnZoSN7mwJmeW7xgm1l3G3vjaufbBG7Z JIKaUmN/1jG4KtDmrfyuCiA8Ii0fRUYQOsFl5L6ThGDthjMBWGgqnwXzfp2jO8H9/E1a SPV6CXjojHAJVjcoUWBL93ZAZPIrJCwfEI5pxhhznk4I7Z4Phzq1SlMKoUO/RcTXZ8/c s3sDCylW/dDalYmH0Rd9wjpTBlBf7bXP7nWvRhzH2MRdOgx6+J1x6t2dOcHKC1zeZr0+ k60hL21jL+jw33zWsUMjeN9nKMOz4Yx5ktEKRYIytk95t+G2x9w/adMkNSZT3nPx1rKR N6eg== X-Gm-Message-State: ACrzQf2hgGl3n22x7qwUhLxlKNXud7bVARlMQIgCiG7ROQ76IwrJdWSe yvjr5NoNX6Z8OMHukeoGk09xsAS5CnSO3TOsa5nCRg== X-Received: by 2002:a05:6902:563:b0:6be:5f26:b9b7 with SMTP id a3-20020a056902056300b006be5f26b9b7mr5519304ybt.36.1665765300585; Fri, 14 Oct 2022 09:35:00 -0700 (PDT) MIME-Version: 1.0 References: <5099dc39-c6d9-115a-855b-6aa98d17eb4b@collabora.com> <8dff3e46-6dac-af6a-1a3b-e6a8b93fdc60@collabora.com> <5db967de-ea7e-9f35-cd74-d4cca2fcb9ee@codeweavers.com> <81b0e6c9-6c13-aecd-1e0e-6417eb89285f@codeweavers.com> In-Reply-To: <81b0e6c9-6c13-aecd-1e0e-6417eb89285f@codeweavers.com> From: Eric Dumazet Date: Fri, 14 Oct 2022 09:34:47 -0700 Message-ID: Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing To: Paul Gofman Cc: Muhammad Usama Anjum , "open list:NETWORKING [TCP]" , LKML , "David S. Miller" , Hideaki YOSHIFUJI , David Ahern , Paolo Abeni , Jakub Kicinski Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 14, 2022 at 9:31 AM Paul Gofman wrote: > > Hello Eric, > > that message was not mine. > > Speaking from the Wine side, we cannot workaround that with > SO_REUSEADDR. First of all, it is under app control and we can't > voluntary tweak app's socket settings. Then, app might be intentionally > not using SO_REUSEADDR to prevent port reuse which of course may be > harmful (more harmful than failure to restart for another minute). What > is broken with the application which doesn't want to use SO_REUSEADDR > and wants to disallow port reuse while it binds to it which reuse will > surely break it? > > But my present question about the listening socket being not > reusable while closed due to linked accepeted socket was not related to > Wine at all. I am not sure how one can fix that in the application if > they don't really want other applications or another copy of the same > one to be able to reuse the port they currently bind to? I believe the > issue with listen socket been not available happens rather often for > native services and they all have to workaround that. While not related > here, I also encountered some out-of-tree hacks to tweak the TIME_WAIT > timeout to tackle this very problem for some cloud custom kernels. > > My question is if the behaviour of blocking listen socket port > while the accepted port (which, as I understand, does not have any > direct relation to listen port anymore from TCP standpoint) is still in > TIME_ or other wait is stipulated by TCP requirements which I am > missing? Or, if not, maybe that can be changed? > Please raise these questions at IETF, this is where major TCP changes need to be approved. There are multiple ways to avoid TIME_WAIT, if you really need to. > Thanks, > Paul. > > > On 10/14/22 11:20, Eric Dumazet wrote: > > On Fri, Oct 14, 2022 at 8:52 AM Paul Gofman wrote: > >> Hello Eric, > >> > >> our problem is actually not with the accept socket / port for which > >> those timeouts apply, we don't care for that temporary port number. The > >> problem is that the listen port (to which apps bind explicitly) is also > >> busy until the accept socket waits through all the necessary timeouts > >> and is fully closed. From my reading of TCP specs I don't understand why > >> it should be this way. The TCP hazards stipulating those timeouts seem > >> to apply to accept (connection) socket / port only. Shouldn't listen > >> socket's port (the only one we care about) be available for bind > >> immediately after the app stops listening on it (either due to closing > >> the listen socket or process force kill), or maybe have some other > >> timeouts not related to connected accept socket / port hazards? Or am I > >> missing something why it should be the way it is done now? > >> > > > > To quote your initial message : > > > > > > We are able to avoid this error by adding SO_REUSEADDR attribute to the > > socket in a hack. But this hack cannot be added to the application > > process as we don't own it. > > > > > > Essentially you are complaining of the linux kernel being unable to > > run a buggy application. > > > > We are not going to change the linux kernel because you can not > > fix/recompile an application. > > > > Note that you could use LD_PRELOAD, or maybe eBPF to automatically > > turn SO_REUSEADDR before bind() > > > > > >> Thanks, > >> Paul. > >> > >> > >> On 9/30/22 10:16, Eric Dumazet wrote: > >>> On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum > >>> wrote: > >>>> Hi Eric, > >>>> > >>>> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because > >>>> of this hazard we have 60 seconds timeout in TIME_WAIT state if > >>>> connection isn't closed properly. From RFC 1337: > >>>>> The TIME-WAIT delay allows all old duplicate segments time > >>>> enough to die in the Internet before the connection is reopened. > >>>> > >>>> As on localhost there is virtually no delay. I think the TIME-WAIT delay > >>>> must be zero for localhost connections. I'm no expert here. On localhost > >>>> there is no delay. So why should we wait for 60 seconds to mitigate a > >>>> hazard which isn't there? > >>> Because we do not specialize TCP stack for loopback. > >>> > >>> It is easy to force delays even for loopback (tc qdisc add dev lo root > >>> netem ...) > >>> > >>> You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead. > >>> > >>> TIME_WAIT sockets are optional. > >>> If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ? > >>> > >>>> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But > >>>> zap is required from privileged (CAP_NET_ADMIN) process. We are having > >>>> hard time finding a privileged process to do this. > >>> Really, we are not going to add kludges in TCP stacks because of this reason. > >>> > >>>> Thanks, > >>>> Usama > >>>> > >>>> > >>>> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote: > >>>>> Hello, > >>>>> > >>>>> We have a set of processes which talk with each other through a local > >>>>> TCP socket. If the process(es) are killed (through SIGKILL) and > >>>>> restarted at once, the bind() fails with EADDRINUSE error. This error > >>>>> only appears if application is restarted at once without waiting for 60 > >>>>> seconds or more. It seems that there is some timeout of 60 seconds for > >>>>> which the previous TCP connection remains alive waiting to get closed > >>>>> completely. In that duration if we try to connect again, we get the error. > >>>>> > >>>>> We are able to avoid this error by adding SO_REUSEADDR attribute to the > >>>>> socket in a hack. But this hack cannot be added to the application > >>>>> process as we don't own it. > >>>>> > >>>>> I've looked at the TCP connection states after killing processes in > >>>>> different ways. The TCP connection ends up in 2 different states with > >>>>> timeouts: > >>>>> > >>>>> (1) Timeout associated with FIN_WAIT_1 state which is set through > >>>>> `tcp_fin_timeout` in procfs (60 seconds by default) > >>>>> > >>>>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It > >>>>> seems like this timeout has come from RFC 1337. > >>>>> > >>>>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It > >>>>> also doesn't seem feasible to change the timeout of TIME_WAIT state as > >>>>> the RFC mentions several hazards. But we are talking about a local TCP > >>>>> connection where maybe those hazards aren't applicable directly? Is it > >>>>> possible to change timeout for TIME_WAIT state for only local > >>>>> connections without any hazards? > >>>>> > >>>>> We have tested a hack where we replace timeout of TIME_WAIT state from a > >>>>> value in procfs for local connections. This solves our problem and > >>>>> application starts to work without any modifications to it. > >>>>> > >>>>> The question is that what can be the best possible solution here? Any > >>>>> thoughts will be very helpful. > >>>>> > >>>>> Regards, > >>>>> > >>>> -- > >>>> Muhammad Usama Anjum > >> >