Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp936633rwi; Fri, 14 Oct 2022 10:27:46 -0700 (PDT) X-Google-Smtp-Source: AMsMyM74S3GYNpn6HlZ/4CzCn26CN1B0vui0L7ARZJcMBaGYHJ02OVNBHt0c9ORBaxNKEUCT9L1w X-Received: by 2002:a17:902:8698:b0:181:f060:7133 with SMTP id g24-20020a170902869800b00181f0607133mr6385885plo.1.1665768466730; Fri, 14 Oct 2022 10:27:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665768466; cv=none; d=google.com; s=arc-20160816; b=WVo3zFMzdtPCLhUmiP9BRXH00sn8H3TRf6oxLPkm6pkY0V+InRKUNtEk7qtCODffUq mpcrUB/DN7zvUWLFnY7jpFTLG502nBxG7/qc2GdxS6ajU6khjSe2Z7tbxRk9PpgXDYc7 G8RBCL4kd95u0Fr9MLfCbIcjBskLZ7nNvC1Rr3R9rAdLDobvX7LRJ6BbEFgcjlbfZW9f bQGSOktptv0VXivZzjlLpAirLTQohmlgK8Zed0zvaHoTIbuyGzFUsavCkbyQVxe8Eed2 D95+JVSwseLTNaU4ercuBJXNhiuxlCynvwMzXW5kjJZQ03SeaQtLGXOlL+E7rQrDLxLi h1NA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=9o7jNG3c03YDctmy+eSZhszBNJT34FJHHrtLHS3IYbU=; b=Uw2cQ8b+qGGU3q8pcqdlhdVZd6dcYqjwRh/toMfMAFGoTrg6wFeX/rDEnJi6w5EOHn rc84vk6XbUdzTpz9tzihPPwvrHWjmY0rYxbaZbW7e8OeA8QbZP1aK0syRcHCE1w8pw10 eqUGVQGd/PbgTYclzhlaL73qZngN6JHyXr/owbIUMPm67migx/L1rR58SFU92SWusSr2 19QUbZ5fNG5iQlzYp8SViVw8BxJ/G1Vrmfhf5mB9K4PTmilFOX/PMZ8gqdcwJiUzMAci azZln1UDCXZfKVt99/UBOUmh/8QtUg/6PoKq1rKNTnzMq+hmVpcbBzdMwGU2Q94lPn51 IGmw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@codeweavers.com header.s=6377696661 header.b=cfN07tKE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=codeweavers.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b6-20020a056a0002c600b0056615a18b20si2986430pft.250.2022.10.14.10.27.35; Fri, 14 Oct 2022 10:27:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@codeweavers.com header.s=6377696661 header.b=cfN07tKE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=codeweavers.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229739AbiJNQKf (ORCPT + 99 others); Fri, 14 Oct 2022 12:10:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229924AbiJNQKb (ORCPT ); Fri, 14 Oct 2022 12:10:31 -0400 X-Greylist: delayed 1059 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Fri, 14 Oct 2022 09:10:29 PDT Received: from mail.codeweavers.com (mail.codeweavers.com [65.103.31.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 156B9AB815; Fri, 14 Oct 2022 09:10:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=codeweavers.com; s=6377696661; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=9o7jNG3c03YDctmy+eSZhszBNJT34FJHHrtLHS3IYbU=; b=cfN07tKEK4vns5r7zxCktEL8FE jK/k4vD5yriMX/9fOgmS1Ew6p/DdJxROc2wLpdjw+ZPxaZkvVDpOkrEJLVIpAK7RHfRipkvwGwt03 4kGTxwl5nh53caLT0AKuTHnGBWPQgPRWIgWP7plxmM2QunYa/6Rs5F1elJhnaeT7IKa8=; Received: from cw141ip123.vpn.codeweavers.com ([10.69.141.123]) by mail.codeweavers.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1ojMzP-00G6NC-38; Fri, 14 Oct 2022 10:52:47 -0500 Message-ID: <5db967de-ea7e-9f35-cd74-d4cca2fcb9ee@codeweavers.com> Date: Fri, 14 Oct 2022 10:52:44 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing Content-Language: en-GB To: Eric Dumazet , Muhammad Usama Anjum Cc: "open list:NETWORKING [TCP]" , LKML , "David S. Miller" , Hideaki YOSHIFUJI , David Ahern , Paolo Abeni , Jakub Kicinski References: <5099dc39-c6d9-115a-855b-6aa98d17eb4b@collabora.com> <8dff3e46-6dac-af6a-1a3b-e6a8b93fdc60@collabora.com> From: Paul Gofman In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,SPF_HELO_PASS, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Eric, our problem is actually not with the accept socket / port for which those timeouts apply, we don't care for that temporary port number. The problem is that the listen port (to which apps bind explicitly) is also busy until the accept socket waits through all the necessary timeouts and is fully closed. From my reading of TCP specs I don't understand why it should be this way. The TCP hazards stipulating those timeouts seem to apply to accept (connection) socket / port only. Shouldn't listen socket's port (the only one we care about) be available for bind immediately after the app stops listening on it (either due to closing the listen socket or process force kill), or maybe have some other timeouts not related to connected accept socket / port hazards? Or am I missing something why it should be the way it is done now? Thanks,     Paul. On 9/30/22 10:16, Eric Dumazet wrote: > On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum > wrote: >> Hi Eric, >> >> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because >> of this hazard we have 60 seconds timeout in TIME_WAIT state if >> connection isn't closed properly. From RFC 1337: >>> The TIME-WAIT delay allows all old duplicate segments time >> enough to die in the Internet before the connection is reopened. >> >> As on localhost there is virtually no delay. I think the TIME-WAIT delay >> must be zero for localhost connections. I'm no expert here. On localhost >> there is no delay. So why should we wait for 60 seconds to mitigate a >> hazard which isn't there? > Because we do not specialize TCP stack for loopback. > > It is easy to force delays even for loopback (tc qdisc add dev lo root > netem ...) > > You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead. > > TIME_WAIT sockets are optional. > If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ? > >> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But >> zap is required from privileged (CAP_NET_ADMIN) process. We are having >> hard time finding a privileged process to do this. > Really, we are not going to add kludges in TCP stacks because of this reason. > >> Thanks, >> Usama >> >> >> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote: >>> Hello, >>> >>> We have a set of processes which talk with each other through a local >>> TCP socket. If the process(es) are killed (through SIGKILL) and >>> restarted at once, the bind() fails with EADDRINUSE error. This error >>> only appears if application is restarted at once without waiting for 60 >>> seconds or more. It seems that there is some timeout of 60 seconds for >>> which the previous TCP connection remains alive waiting to get closed >>> completely. In that duration if we try to connect again, we get the error. >>> >>> We are able to avoid this error by adding SO_REUSEADDR attribute to the >>> socket in a hack. But this hack cannot be added to the application >>> process as we don't own it. >>> >>> I've looked at the TCP connection states after killing processes in >>> different ways. The TCP connection ends up in 2 different states with >>> timeouts: >>> >>> (1) Timeout associated with FIN_WAIT_1 state which is set through >>> `tcp_fin_timeout` in procfs (60 seconds by default) >>> >>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It >>> seems like this timeout has come from RFC 1337. >>> >>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It >>> also doesn't seem feasible to change the timeout of TIME_WAIT state as >>> the RFC mentions several hazards. But we are talking about a local TCP >>> connection where maybe those hazards aren't applicable directly? Is it >>> possible to change timeout for TIME_WAIT state for only local >>> connections without any hazards? >>> >>> We have tested a hack where we replace timeout of TIME_WAIT state from a >>> value in procfs for local connections. This solves our problem and >>> application starts to work without any modifications to it. >>> >>> The question is that what can be the best possible solution here? Any >>> thoughts will be very helpful. >>> >>> Regards, >>> >> -- >> Muhammad Usama Anjum