Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp3683858rwb; Fri, 30 Sep 2022 07:03:28 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4ow7YxBb9NxARPBMJUrke3eLDWQCP/8t/K9uKmXVTL9PyPaLkDayr3ajQroc1eOI8RimYb X-Received: by 2002:a17:907:960e:b0:782:68e3:620f with SMTP id gb14-20020a170907960e00b0078268e3620fmr6466070ejc.663.1664546608430; Fri, 30 Sep 2022 07:03:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664546608; cv=none; d=google.com; s=arc-20160816; b=guh+hb5nNDDcctUKaUsDZJpGAGXAx6asSXHiwC71Vyien7PQhzSb04WCXYQz+qZuXv irGmGV+HEeVpCbjUgylRw8kg1Bs0dxP6fOtkLJwA2sMS2HTKUGGzNoMCUxFG4CyLSMA3 PzDiYgvVbquesPKlikzISgAMdOY2Q1dUJvLq+/rBY1z5f89PqCaFQ548UUK5v9tD4ZAl pVrk4fuNrZXF+NjrMe3t4us/0x5WuMP35NCxR2CD//IeEiHYBoNBSD7aM/cdgAzQ4bVT lsf5YLAykO2+nqFIlT2DcKX0Pp0sVBzqZ2WIqh0tKaiXhLvMk+H0ojklApQ1ecBhVx78 kseQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:to:content-language:subject:cc:user-agent:mime-version :date:message-id:dkim-signature; bh=/88mAON3tIE73CfcE3N4vfFW0lbkP367P5Cp1hiAgoA=; b=DGwKqPQAKbvdmLwP7fKpNWDlpW0sLOLksIaW+k5ynjYNrevXo+8dseg0b7mpAAdO1J 7zdnsaAuzgvC6iqEo28WoC5hKsovbfxbalJi3maCRJ7Yt1lAFYCgoqd4gvLiy3msShj1 IR17o3rGbO1GpMgwAH6BJaZLaNBYzJ2Qh/eutWEKTq6J4GEd4AEt9pnEQdWEvcmWpA7s 9Yl2MrFfoeZpk4xKpRFq0Q5GaOuUfYVJipldOgsDifnmCjCLFiXj72FTHSVue1QrdQl7 WD8BCISaSpobNNBgJM0j32xweryVWzF61vAzMu6WY3j1MTKyMSlR9C9CaSgHu9N6zdeV C/MQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=Pspg5wlT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dp15-20020a170906c14f00b0077b45792af2si1950110ejc.464.2022.09.30.07.02.55; Fri, 30 Sep 2022 07:03:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=Pspg5wlT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231216AbiI3NYP (ORCPT + 99 others); Fri, 30 Sep 2022 09:24:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231213AbiI3NYM (ORCPT ); Fri, 30 Sep 2022 09:24:12 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB87518C035; Fri, 30 Sep 2022 06:24:09 -0700 (PDT) Received: from [192.168.10.9] (unknown [39.45.148.204]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id 40D9066022CC; Fri, 30 Sep 2022 14:24:05 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1664544247; bh=W/Kwc4wta7KyHGZLcVb7h564UUNeE2X66AFYxu204aA=; h=Date:Cc:Subject:To:References:From:In-Reply-To:From; b=Pspg5wlTw9kHtfwq62h/n25uJW0AEaLgY5aEzicVN/FwO3PLHWaBqmOYhJmlMuZfh 77iOEg4jStib65HllIsiMtxsm7aUQ7Tl98F4s3IVOK846ASexDS+xr4yQqLbHOs/5b +kMPV8GyLUkpnrddiufHO4nyI26EfQvkpmWQEdveqKmdTuuEaWb8GLQ8ow5wCSekRl 5Zt10Gc7Fne0nVjrl4eGz/CfJf7L8fV0r4vxiMxlzJayCZsCNZW6mBL3sd9UuCu/wW QiLyOEawo70ELHlkICeFoeT9SD1V7lI8LhdCXl4XNqedcS7bwFOdVFYaCHTK0HQkky BWiMDwyHPXsMQ== Message-ID: <8dff3e46-6dac-af6a-1a3b-e6a8b93fdc60@collabora.com> Date: Fri, 30 Sep 2022 18:24:00 +0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Cc: usama.anjum@collabora.com, LKML , "David S. Miller" , Hideaki YOSHIFUJI , David Ahern , Paolo Abeni , Jakub Kicinski , Paul Gofman Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing Content-Language: en-US To: Eric Dumazet , "open list:NETWORKING [TCP]" References: <5099dc39-c6d9-115a-855b-6aa98d17eb4b@collabora.com> From: Muhammad Usama Anjum In-Reply-To: <5099dc39-c6d9-115a-855b-6aa98d17eb4b@collabora.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Eric, RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because of this hazard we have 60 seconds timeout in TIME_WAIT state if connection isn't closed properly. From RFC 1337: > The TIME-WAIT delay allows all old duplicate segments time enough to die in the Internet before the connection is reopened. As on localhost there is virtually no delay. I think the TIME-WAIT delay must be zero for localhost connections. I'm no expert here. On localhost there is no delay. So why should we wait for 60 seconds to mitigate a hazard which isn't there? Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But zap is required from privileged (CAP_NET_ADMIN) process. We are having hard time finding a privileged process to do this. Thanks, Usama On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote: > Hello, > > We have a set of processes which talk with each other through a local > TCP socket. If the process(es) are killed (through SIGKILL) and > restarted at once, the bind() fails with EADDRINUSE error. This error > only appears if application is restarted at once without waiting for 60 > seconds or more. It seems that there is some timeout of 60 seconds for > which the previous TCP connection remains alive waiting to get closed > completely. In that duration if we try to connect again, we get the error. > > We are able to avoid this error by adding SO_REUSEADDR attribute to the > socket in a hack. But this hack cannot be added to the application > process as we don't own it. > > I've looked at the TCP connection states after killing processes in > different ways. The TCP connection ends up in 2 different states with > timeouts: > > (1) Timeout associated with FIN_WAIT_1 state which is set through > `tcp_fin_timeout` in procfs (60 seconds by default) > > (2) Timeout associated with TIME_WAIT state which cannot be changed. It > seems like this timeout has come from RFC 1337. > > The timeout in (1) can be changed. Timeout in (2) cannot be changed. It > also doesn't seem feasible to change the timeout of TIME_WAIT state as > the RFC mentions several hazards. But we are talking about a local TCP > connection where maybe those hazards aren't applicable directly? Is it > possible to change timeout for TIME_WAIT state for only local > connections without any hazards? > > We have tested a hack where we replace timeout of TIME_WAIT state from a > value in procfs for local connections. This solves our problem and > application starts to work without any modifications to it. > > The question is that what can be the best possible solution here? Any > thoughts will be very helpful. > > Regards, > -- Muhammad Usama Anjum