Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp4166468rwb; Mon, 31 Jul 2023 02:31:03 -0700 (PDT) X-Google-Smtp-Source: APBJJlGfNvD7yGN3ROvLhBUjmQl3TH9ug6s4ZA+8/WXvuL6R8YAafJXD3ekdzWv+M6ctQVY+x8x0 X-Received: by 2002:a05:6a00:c91:b0:687:4802:38e1 with SMTP id a17-20020a056a000c9100b00687480238e1mr535199pfv.21.1690795863173; Mon, 31 Jul 2023 02:31:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690795863; cv=none; d=google.com; s=arc-20160816; b=YGrgWu8YOzOoMGJBauIPX9CvgDlb/FPdUfWtGlLOAkGR0mxc7SyvUB7J6UzZ7vL1rE eeRsTJX3/rWlYLEf8lTmmawe4MgkNzLi9cpQwFjKypeSP1sVFktocYYsbRTYNcBDTqJ9 7bzC8MXVOOlXOtoszyiCYon7t2yt+jqm0PMzOz9ePdUvPcOwx+WHVjyPEW+GvvAWAy4A 1uzFfWK5OBRP8W92gO+qJ3offY4Pz3Y6dJrPLaaU8aDn9eszfgt14YTdYuj74OF9dsMj jOqx0uUKO1vzzZdjlqg7eyGJQPII8BneWpkgLELZf+Bvl2ZwaQV5ETvJfbuS1VwnMMUC a4wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=+9aBHCaVBve/PqXGtBjK3ffvPKhcImg3qkI2In3ZhF4=; fh=8yOmpWj3Z8JnqL/Of+nmt0tb7A9/uHTm1GVShm/r2yg=; b=irXzy/ut7BMMkszIa62X3WK0dS1Yjko/1VLzxfBvZZ+dToRdakQUQgak9JMp/wvkQa 3PzUP7Wst9vZnLGNUq+gW2aSW3g2KWDjXQScj2CoxG1z1WnthtQi79IxJIqCVbXr8tBd xiHJ1jdDlcCQb5+xwUnPVEOVC1vGTe/J22V/FuJNz86BVGzkNrBL29B3XlElC89bRTuA ZK/2etvGVFOwe0H0mxu2e9DE192YswvtTGGpgWXQZ/3EiCMxAsgZTO5vExkPhEyvjvIZ bBWY0x6z2vS9aTKHeS+fZC7zrIKyGPZM05L/CIZaUgcnS25zC9q9xcb8DOKcmeN5WmQc x/yQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=XMkJKINB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d2-20020aa78e42000000b00665dfbf3b1esi6902780pfr.270.2023.07.31.02.30.50; Mon, 31 Jul 2023 02:31:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=XMkJKINB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229725AbjGaIYx (ORCPT + 99 others); Mon, 31 Jul 2023 04:24:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229469AbjGaIYb (ORCPT ); Mon, 31 Jul 2023 04:24:31 -0400 Received: from mail-yw1-x1143.google.com (mail-yw1-x1143.google.com [IPv6:2607:f8b0:4864:20::1143]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DAFB11C; Mon, 31 Jul 2023 01:24:28 -0700 (PDT) Received: by mail-yw1-x1143.google.com with SMTP id 00721157ae682-583b3aa4f41so45454677b3.2; Mon, 31 Jul 2023 01:24:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690791868; x=1691396668; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+9aBHCaVBve/PqXGtBjK3ffvPKhcImg3qkI2In3ZhF4=; b=XMkJKINBc9dC70yNRJVeY4bWAgD6J4rSXig/yzLmlHc9fFl4ZL80kzOb4/omuKrCUq NGSzFZ5O9114KatZZlHfL4a0dIn7v69UfiOPmJDQ2ljFT4oVMrHtvCbE5I102Ffjcv8s ZpREaQynYtrErrdbVLZzDHUdZqp10YWrsEAuRv28DAMSRaLwV0SXTL2FqnizKaC66EY9 mhuEhdHv3HBXtLRZ96MVlv50XA6PLinB5Epmgjz1ANl8SmmlruiE82tEKae10c4Xtjk0 Y32aNmJUxLdW8Tq17H34CL7vP5pxjUVZ+tQeVztEnCrmMcMhA4WDNBQDiiR/+AckP7zH 6WXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690791868; x=1691396668; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+9aBHCaVBve/PqXGtBjK3ffvPKhcImg3qkI2In3ZhF4=; b=fmxwRpl2ayNt2575pLwgIeqY9+6Xd6K8aQKbhfqN4l4R6rqBro2Reh284e4LaduZyx TipZTR2f1TveLWWeLn6BXgIHEill8uJ+OhOMt1ZEPOt4MTKdXVN32bDLD8MDd5u01eGv ewQ7Q+30UbCwfDdIlxKZzpRpSnyDAyaPXB2UiJ9ByOWvL276NqhuB5fJ6F7tnxHQnG/l IzU3bZDAZX7tY2aYNEM4CE6Nq9RMJShcyPCfNwUPtPUVIlq2kQS52e6g3Les77shFBnA Gcnn1BV1gpFtSzS/XWeQ+V60e96GIvGWMS8OkhFBfoWYflkXsL9/98oH8EyPurJol1uN 0QVQ== X-Gm-Message-State: ABy/qLa2/mgNI4K67tf3vV7mUu8cAEHRVGnLsImymYnuPjLqQNQDWJXR Zlz/3opDYqm32Iv55vvz43UvI2T24qODBIo50l4= X-Received: by 2002:a81:838a:0:b0:576:d65d:2802 with SMTP id t132-20020a81838a000000b00576d65d2802mr9913666ywf.3.1690791868068; Mon, 31 Jul 2023 01:24:28 -0700 (PDT) MIME-Version: 1.0 References: <20230727125125.1194376-1-imagedong@tencent.com> <20230727125125.1194376-4-imagedong@tencent.com> In-Reply-To: From: Menglong Dong Date: Mon, 31 Jul 2023 16:24:16 +0800 Message-ID: Subject: Re: [PATCH net-next 3/3] net: tcp: check timeout by icsk->icsk_timeout in tcp_retransmit_timer() To: Neal Cardwell Cc: Eric Dumazet , davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Menglong Dong , Yuchung Cheng , Soheil Hassas Yeganeh Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 28, 2023 at 10:25=E2=80=AFPM Neal Cardwell wrote: > > On Fri, Jul 28, 2023 at 1:50=E2=80=AFAM Eric Dumazet wrote: [...] > > In that packetdrill case AFAICT that is the ZWP timer firing, and the > sender sends a ZWP. > > I think maybe Menglong is looking more at something like the following > scenario, where at the time the RTO timer fires the data sender finds > the tp->snd_wnd is zero, so it sends a retransmit of the > lowest-sequence data packet. > > Here is a packetdrill case and the tcpdump trace on an upstream > net-next kernel... I have not worked out all the details at the end, > but perhaps it can help move the discussion forward: > > > ~/packetdrill/gtests/net/tcp/receiver_window# cat rwin-rto-zero-window.pk= t > // Test how sender reacts to unexpected arrival rwin of 0. > > `../common/defaults.sh` > > // Create a socket. > 0 socket(..., SOCK_STREAM, IPPROTO_TCP) =3D 3 > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 > +0 bind(3, ..., ...) =3D 0 > +0 listen(3, 1) =3D 0 > > // Establish a connection. > +.1 < S 0:0(0) win 65535 > +0 > S. 0:0(0) ack 1 win 65535 > +.1 < . 1:1(0) ack 1 win 457 > +0 accept(3, ..., ...) =3D 4 > > +0 write(4, ..., 20000) =3D 20000 > +0 > P. 1:10001(10000) ack 1 > > // TLP > +.2 > . 10001:11001(1000) ack 1 > // Receiver has retracted rwin to 0 > // (perhaps from the 2023 proposed OOM code?). > +.1 < . 1:1(0) ack 1 win 0 > > // RTO, and in tcp_retransmit_timer() we see the receiver window is zero, > // so we take the special f (!tp->snd_wnd...) code path. > +.2 > . 1:1001(1000) ack 1 > +.1 < . 1:1(0) ack 1 win 0 > > +.5 > . 1:1001(1000) ack 1 > +.1 < . 1:1(0) ack 1 win 0 > > +1.2 > . 1:1001(1000) ack 1 > +.1 < . 1:1(0) ack 1 win 0 > > +2.4 > . 1:1001(1000) ack 1 > +.1 < . 1:1(0) ack 1 win 0 > > +4.8 > . 1:1001(1000) ack 1 > +.1 < . 1:1(0) ack 1 win 0 > > +9.6 > . 1:1001(1000) ack 1 > +.1 < . 1:1(0) ack 1 win 0 > > +19.2 > . 1:1001(1000) ack 1 > +.1 < . 1:1(0) ack 1 win 0 > > +38.4 > . 1:1001(1000) ack 1 > +.1 < . 1:1(0) ack 1 win 0 > > +76.8 > . 1:1001(1000) ack 1 > +.1 < . 1:1(0) ack 1 win 0 > > +120 > . 1:1001(1000) ack 1 > +.1 < . 1:1(0) ack 1 win 0 > > +120 > . 1:1001(1000) ack 1 > +.1 < . 1:1(0) ack 1001 win 1000 > > // Received non-zero window update. Send more data. > +0 > P. 1001:3001(2000) ack 1 > +.1 < . 1:1(0) ack 3001 win 1000 > > ---------- > When I run that script on a net-next kernel I see the rounding up of > the RTO to 122 secs rather than 120 secs, but for whatever reason the > script does not cause the socket to die early... > I think I know the reason now. Without the 2nd patches that I send in this series, the ACK can't update the rwin to 0, as it will be ignored in tcp_may_update_window(). However, you can send an ACK that acknowledges the new data to update the rwin to 0. I modified your script, and it can die as we excepted: // Test how sender reacts to unexpected arrival rwin of 0. `../common/defaults.sh` // Create a socket. 0 socket(..., SOCK_STREAM, IPPROTO_TCP) =3D 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 +0 bind(3, ..., ...) =3D 0 +0 listen(3, 1) =3D 0 // Establish a connection. +.1 < S 0:0(0) win 65535 +0 > S. 0:0(0) ack 1 win 65535 +.1 < . 1:1(0) ack 1 win 457 +0 accept(3, ..., ...) =3D 4 +0 write(4, ..., 20000) =3D 20000 +0 > P. 1:10001(10000) ack 1 // Update the window to 0. "ack 0 win 0" won't update the window, as it // will be ignored by tcp_may_update_window() +.1 < . 1:1(0) ack 1001 win 0 // RTO, and in tcp_retransmit_timer() we see the receiver window is zero, // so we take the special f (!tp->snd_wnd...) code path. +.2 > . 1001:2001(1000) ack 1 +.1 < . 1:1(0) ack 1001 win 0 +.5 > . 1001:2001(1000) ack 1 +.1 < . 1:1(0) ack 1001 win 0 +1.2 > . 1001:2001(1000) ack 1 +.1 < . 1:1(0) ack 1001 win 0 +2.4 > . 1001:2001(1000) ack 1 +.1 < . 1:1(0) ack 1001 win 0 +4.8 > . 1001:2001(1000) ack 1 +.1 < . 1:1(0) ack 1001 win 0 +9.6 > . 1001:2001(1000) ack 1 +.1 < . 1:1(0) ack 1001 win 0 +19.2 > . 1001:2001(1000) ack 1 +.1 < . 1:1(0) ack 1001 win 0 +38.4 > . 1001:2001(1000) ack 1 +.1 < . 1:1(0) ack 1001 win 0 +76.8 > . 1001:2001(1000) ack 1 +.1 < . 1:1(0) ack 1001 win 0 // socket will die in tcp_retransmit_timer() in the // "tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX" code path. // Following retransmit won't happen. +120 > . 1001:2001(1000) ack 1 +.1 < . 1:1(0) ack 1001 win 0 ---------------------------------------------------------------------------= --- I don't know how to check the die of socket with packetdrill, so I checked it by: ss -nitme | grep 8080 | grep on And I can see the socket die after timeout of the 120seconds timer. $ packetdrill ./rwin-rto-zero-window.pkt ./rwin-rto-zero-window.pkt:55: error handling packet: Timed out waiting for packet > The tcpdump trace: > > tcpdump -ttt -n -i any port 8080 & > > -> > > ~/packetdrill/gtests/net/tcp/receiver_window# > ../../packetdrill/packetdrill rwin-rto-zero-window.pkt > 00:01:01.370344 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [S], seq 0, win 65535, options [mss > 1000,nop,nop,sackOK,nop,wscale 6], length 0 > 00:00:00.000096 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [S.], seq 3847169154, ack 1, win 65535, options [mss > 1460,nop,nop,sackOK,nop,wscale 14], length 0 > 00:00:00.100277 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 457, length 0 > 00:00:00.000090 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [P.], seq 1:2001, ack 1, win 4, length 2000: HTTP > 00:00:00.000006 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [P.], seq 2001:4001, ack 1, win 4, length 2000: HTTP > 00:00:00.000003 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [P.], seq 4001:6001, ack 1, win 4, length 2000: HTTP > 00:00:00.000002 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [P.], seq 6001:8001, ack 1, win 4, length 2000: HTTP > 00:00:00.000001 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [P.], seq 8001:10001, ack 1, win 4, length 2000: HTTP > 00:00:00.209131 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 10001:11001, ack 1, win 4, length 1000: HTTP > 00:00:00.100190 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 0, length 0 > 00:00:00.203824 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP > 00:00:00.100175 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 0, length 0 > 00:00:00.507835 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP > 00:00:00.100192 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 0, length 0 > 00:00:01.115858 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP > 00:00:00.100182 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 0, length 0 > 00:00:02.331747 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP > 00:00:00.100198 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 0, length 0 > 00:00:04.955980 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP > 00:00:00.100197 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 0, length 0 > 00:00:09.627985 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP > 00:00:00.100179 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 0, length 0 > 00:00:19.355725 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP > 00:00:00.100203 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 0, length 0 > 00:00:42.395633 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP > 00:00:00.100202 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 0, length 0 > 00:01:17.724059 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP > 00:00:00.100201 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 0, length 0 > 00:02:02.779516 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP > 00:00:00.100229 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1, win 0, length 0 > 00:02:02.779828 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP > 00:00:00.100230 ? In IP 192.0.2.1.51231 > 192.168.56.132.8080: > Flags [.], ack 1001, win 1000, length 0 > 00:00:00.000034 ? Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 11001:12001, ack 1, win 4, length 1000: HTTP > 00:00:00.000005 ? Out IP 192.168.56.132.8080 > 192.0.2.1.51231: > Flags [.], seq 12001:13001, ack 1, win 4, length 1000: HTTP > > rwin-rto-zero-window.pkt:62: error handling packet: live packet field > tcp_psh: expected: 1 (0x1) vs actual: 0 (0x0) > script packet: 405.390244 P. 1001:3001(2000) ack 1 > actual packet: 405.390237 . 11001:13001(2000) ack 1 win 4