Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp72267pxb; Fri, 19 Feb 2021 18:29:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJx3KA5A54jifXKO2coMTZSmkpLGxAtn2sAARmMO1NvUmIxkeLmmNFS/iIAZGxotJsNfA1OG X-Received: by 2002:a17:907:3e26:: with SMTP id hp38mr11459394ejc.459.1613788178699; Fri, 19 Feb 2021 18:29:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613788178; cv=none; d=google.com; s=arc-20160816; b=wlLGfl1CcFWqjaXXDAfJHPnQNR1gavqURQFsZD/3Sk3NGQQQHcaG1Qf+j9agAtWUCK IjV9ZSPtxbABJkeBGCkcGBS75I/tHnLaIRkNl9We+rJC7ee/eWs3De4WAraLdRswbRf3 pYwAR/RFrtt3vauyH0bebd8B2i/ZN7EHzZqNy8dIl5xioprLp/GAhGJR154ta29lcznf PzsuQ4l/0cGp3oKbdeTlCM6zRErGME6gbw3VZU3FqgeBMg7nkN7Ffced8M4EOKKhjQaP UynHl5mHmaWNALWfa/3Lolx3h1A5thGnRCvMWl6AUsgGHmYEs5tuj6r32kjccTNijs2Z FqsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=UoU4tpvF69vvvt1X/oNX9U45MXD4nIA/xhwKBjIoiM4=; b=vKxMBd40zs221cMPscFOa8JCXBn0JLPV3MdFeqQ0GS0fuaLqSMTUNRVniyG4Ko/HYB EwLt1351JervG2BmjrMGM5kcnxrD4+FkQ8Whd9qs8zlWtviHEDmof39oKMpTRJwKB8Hd schsWskWnp7ugmKYzOuSggC1o+zpw5UKTAxyis4g3Yjn58Ne2N7tdhJpAfOqUUXQGg1G yrs2l+BGLTJjERxJMhWCqsCfMh61yTZMdXm/GodWK2U6/UlMUGUJ/A641rcHsgMfpAHZ Xg3BR0e/6gdTbQD358CKo+oMgMbbMpKtBSxisrYr5njnBMYGTx9np2igatZra+nBZbbN vh6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=i5SvSNPx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y26si6412691edw.372.2021.02.19.18.29.13; Fri, 19 Feb 2021 18:29:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=i5SvSNPx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229959AbhBTCZY (ORCPT + 99 others); Fri, 19 Feb 2021 21:25:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229765AbhBTCZU (ORCPT ); Fri, 19 Feb 2021 21:25:20 -0500 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F840C061786 for ; Fri, 19 Feb 2021 18:24:40 -0800 (PST) Received: by mail-wr1-x432.google.com with SMTP id u14so11354919wri.3 for ; Fri, 19 Feb 2021 18:24:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UoU4tpvF69vvvt1X/oNX9U45MXD4nIA/xhwKBjIoiM4=; b=i5SvSNPxpFV0SP2RexUvRcZ/V7GJjPefhgRTAdSd7hi7rrqDZo3+Jj3PWDpN4BTFVb SjPzPSlFjWYV7c6OqI1CKT9+z7W8/GwQaPnPIek1oftL6HSoNkAmesC4x/uKWbJttSpc lXfuIMSZ5mV0VW1TXXv6n63ksw+XB60a29pxX8z6j3uLz07hQhw5BAsCbHRlV2Vml2Ym LXjbYM1bWgOcQjEMyFmXKPM5Yr96AAsfx1ZLKHl/kAmzE/zplATkvYeuZAQiMXIUPnVH pwtkbEhF+t8PKSXE3GxRA0H9Ff8K4WV5AwzPcO93FfwuCw4Vh0+AFM46zruBv+K6BVCA RFcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UoU4tpvF69vvvt1X/oNX9U45MXD4nIA/xhwKBjIoiM4=; b=Vz7W7ZZZVJg6L7v1LvEAze+rVduevT2PtXFA/2y7lBRgReOPmwrZBtqZAUIrL0gYhg Sk5z9qpTXnZ6ttdO62U0CaUUikwjsmtU+MuZUmDitizCOEYYg5h+gAawIaHc0yRYWDI3 XmKt1cdr7JekLXh2r25Erfrf93CMBTvPjALwWZT8x+JbExpTGIQtB+3vRgLnJAmETx/K P7ziCC5ET8fG2n/fxHp5Zcam31XRudzOSrwCNWhUtDo67SYDvtwNCdIX3RYo2sYD9X2K dqMrI7GfVF/a4OXSXUqX+QAzc82yatgIaYSBh2rNeJqh2nE6D51kj0vJjHvnjzzOlO7I MP5A== X-Gm-Message-State: AOAM532MeHXrUSLCaBWVErnhWNs7Ygz1jKQWBbdVhmRUsfW7ebhVVdyV uXBUYEL532NmnQ0WqH+Wr5iUcMZ2UeKcJ9ZWTbuKtw== X-Received: by 2002:adf:8084:: with SMTP id 4mr11637816wrl.49.1613787878514; Fri, 19 Feb 2021 18:24:38 -0800 (PST) MIME-Version: 1.0 References: <20210220005447.GA93678@localhost.localdomain> In-Reply-To: <20210220005447.GA93678@localhost.localdomain> From: Yuchung Cheng Date: Fri, 19 Feb 2021 18:24:00 -0800 Message-ID: Subject: Re: [PATCH net] tcp: fix keepalive when data remain undelivered To: Enke Chen Cc: Eric Dumazet , "David S. Miller" , Hideaki YOSHIFUJI , David Ahern , Jakub Kicinski , netdev , LKML , Neal Cardwell , enkechen2020@gmai.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 19, 2021 at 4:54 PM Enke Chen wrote: > > From: Enke Chen > > TCP keepalive does not timeout under the condition that network connection > is lost and data remain undelivered (incl. retransmit). A very simple > scenarios of the failure is to write data to a tcp socket after the network > connection is lost. AFAIK current Linux TCP KA implementation does not violate the RFC793bis (Section 3.7.4 Keep-Alives) https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-20#section-3.7.4 We have even raised that in IETF tcpm list to get more clarity https://mailarchive.ietf.org/arch/msg/tcpm/KxcEsLtDuDNhcP8UjbyPkJqpzsE/ I believe you interpret the keep-alive differently -- so this is arguably a subjective "bug-fix". As Neal and I have expressed in our private communications, current Linux KA has been implemented for more than a decade. Changing this behavior may break many existing applications even if it may benefit certain. > > Under the specified condition the keepalive timeout is not evaluated in > the keepalive timer. That is the primary cause of the failure. In addition, > the keepalive probe is not sent out in the keepalive timer. Although packet > retransmit or 0-window probe can serve a similar purpose, they have their > own timers and backoffs that are generally not aligned with the keepalive > parameters for probes and timeout. > > As the timing and conditions of the events involved are random, the tcp > keepalive can fail randomly. Given the randomness of the failures, fixing > the issue would not cause any backward compatibility issues. As was well > said, "Determinism is a special case of randomness". > > The fix in this patch consists of the following: > > a. Always evaluate the keepalive timeout in the keepalive timer. > > b. Always send out the keepalive probe in the keepalive timer (post the > keepalive idle time). Given that the keepalive intervals are usually > in the range of 30 - 60 seconds, there is no need for an optimization > to further reduce the number of keepalive probes in the presence of > packet retransmit. > > c. Use the elapsed time (instead of the 0-window probe counter) in > evaluating tcp keepalive timeout. > > Thanks to Eric Dumazet, Neal Cardwell, and Yuchung Cheng for helpful > discussions about the issue and options for fixing it. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2 Initial git repository build") > Signed-off-by: Enke Chen > --- > net/ipv4/tcp_timer.c | 20 ++++++-------------- > 1 file changed, 6 insertions(+), 14 deletions(-) > > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c > index 4ef08079ccfa..16a044da20db 100644 > --- a/net/ipv4/tcp_timer.c > +++ b/net/ipv4/tcp_timer.c > @@ -708,29 +708,23 @@ static void tcp_keepalive_timer (struct timer_list *t) > ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT))) > goto out; > > - elapsed = keepalive_time_when(tp); > - > - /* It is alive without keepalive 8) */ > - if (tp->packets_out || !tcp_write_queue_empty(sk)) > - goto resched; > - > elapsed = keepalive_time_elapsed(tp); > > if (elapsed >= keepalive_time_when(tp)) { > /* If the TCP_USER_TIMEOUT option is enabled, use that > * to determine when to timeout instead. > */ > - if ((icsk->icsk_user_timeout != 0 && > - elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) && > - icsk->icsk_probes_out > 0) || > - (icsk->icsk_user_timeout == 0 && > - icsk->icsk_probes_out >= keepalive_probes(tp))) { > + u32 timeout = icsk->icsk_user_timeout ? > + msecs_to_jiffies(icsk->icsk_user_timeout) : > + keepalive_intvl_when(tp) * keepalive_probes(tp) + > + keepalive_time_when(tp); > + > + if (elapsed >= timeout) { > tcp_send_active_reset(sk, GFP_ATOMIC); > tcp_write_err(sk); > goto out; > } > if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) { > - icsk->icsk_probes_out++; > elapsed = keepalive_intvl_when(tp); > } else { > /* If keepalive was lost due to local congestion, > @@ -744,8 +738,6 @@ static void tcp_keepalive_timer (struct timer_list *t) > } > > sk_mem_reclaim(sk); > - > -resched: > inet_csk_reset_keepalive_timer (sk, elapsed); > goto out; > > -- > 2.29.2 >