Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp4220602rwi; Wed, 2 Nov 2022 08:10:36 -0700 (PDT) X-Google-Smtp-Source: AMsMyM57LiHEvwyqYOENZKdvoSz8To9EqS93ZVJgee/v0Rsb+V7iRVI1yAwUv3/SvgmjUAI3puai X-Received: by 2002:a17:90b:3a8f:b0:214:2211:8143 with SMTP id om15-20020a17090b3a8f00b0021422118143mr6381056pjb.219.1667401836037; Wed, 02 Nov 2022 08:10:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667401836; cv=none; d=google.com; s=arc-20160816; b=JRnemMaSlwd9o0+69GLy337kjvzi+SfdyPv2Re9oeirI8JphEEyRLjJ0sLsEvCvc07 LMCbXPXPh39u8wWG5Fo54ExYmkSDvBmG+KfGS6RP/aYEuiW9P8WlPC/mSZ6A586vIq+E iYMybKdW7VFN3zkuHJl8Nv9uMOi+0YeYek9U2zgiiiDQ94wqTv0NRQf+d1G0pVIS0Zo7 zFu96eCJ4ipUflL57fvXfDHTcZnp1zbWxOUXxwzdkrKdUbsCHTwIu/SnlibSkZgDktSp er44HpF/kbxnq2eAe1WBs4dh9n4kngiPXnOKkqXWsq9IWIi1/6zuB8rstKFj1oOaDMca JqYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=VshapdH3qET+2A6w1tOn4K606r3mUYcBf7u3/yyiKmk=; b=bHwcqYt7DhGi8UWy0MIMxE8Z194L1feu7eyhVFHhsaDbPvE8sG297yojqz+LbqNjTf z96oF4sQOWoATtL89Szl6uzvSX0UbO9l6j5vxp8SCI10doIVEUclfcdwMUWCfIvkBCOk nL15JpR1PZswc9NhBD7jrQALzg/VftM7OONdJB/uViAhlw1jbMPk4rWDc8TfJ5RUF4hb 6OO2Njg2q60xSl/Xqhpc0xTaFnIZwMOHkoGSYtFVTqP83XhOkho4jFLfUB9IMEt4rufr w2Gt+OwhR5AyAzsFLtFQ/HyiplFGIwZ4tSl42uVy8lprHqzsG/gexLPdNiSfh9WvrNQV SvHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=X1ZE7dBE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n22-20020a63f816000000b0042b30f95f99si16595096pgh.807.2022.11.02.08.09.57; Wed, 02 Nov 2022 08:10:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=X1ZE7dBE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229534AbiKBOrP (ORCPT + 99 others); Wed, 2 Nov 2022 10:47:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229770AbiKBOrJ (ORCPT ); Wed, 2 Nov 2022 10:47:09 -0400 Received: from mail-oa1-x2d.google.com (mail-oa1-x2d.google.com [IPv6:2001:4860:4864:20::2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41EB12A71C for ; Wed, 2 Nov 2022 07:47:04 -0700 (PDT) Received: by mail-oa1-x2d.google.com with SMTP id 586e51a60fabf-13b103a3e5dso20557984fac.2 for ; Wed, 02 Nov 2022 07:47:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=VshapdH3qET+2A6w1tOn4K606r3mUYcBf7u3/yyiKmk=; b=X1ZE7dBE8BaXuX49QNjyAiYn/xRrnl37HJQP+g4AAanP5xN2owJH/6UXFJIyxXomaA U9vHvAqFOSeZxeXKizH32CmTQKnDecpwP6keQSD2cVWUa2xRtOsTEiPjpg/fr+5CQ4dH 8UvcafbvgUU7l7nJ+ihSc8hKhRMs38ogZaoM+eLCcBPdjYeOxo9qCFQrOPs5YBzRhiEZ T7nUuIFGkBCxWDcdSxNH7XyE68DAiVl5s7SRysSDGVuoVokC0j0SoxQtuU547mDFFO5H DXW4P4QEehA03BUO/uJpRdW8e2g7Vai6zERny4iD35McHq63cSQVclhMwd7QX4repfk3 PvVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=VshapdH3qET+2A6w1tOn4K606r3mUYcBf7u3/yyiKmk=; b=5utaOGA9fnNxll3t0Aa/0w51CR9NwP2WDcmnnAp3BzWkeZ7YElGPJYOgN1XLKhec6t KkrWpCIrJspY3ZxC3ETePHqr0CqdtWTblEZKgCu85uCc/WnVsSQOlk70V1h5q6gvcOCG 0WGF5+b0urmNj3R1vA7gYFoJcAzvxgMqFJiwP43LFys+qqAADp71yzP7Ek7hjQxWZOfo 7DkwZK5rbM4W6EPDifCYdbnjHMJ9BdqlRuO7JpW9bth7wSavqwN8E5paX46w/5m7yg0Z GCvsTFTwipp0wUTDJWRYPZyjD9zzjsGN6HpLt7Zf3xlLlRpAUSDei3T2eWwxOhaH0+ja RMTQ== X-Gm-Message-State: ACrzQf3y6lDwJ6uT01BqewXgs6GQ0hJ1lNfk4Cp3vQwpI72QmXJ1F49s VjlhEw2ojxjJQXgsl4B2yGfFxVKHC329+rNfybxJHw== X-Received: by 2002:a05:6870:9a05:b0:132:ebf:dc61 with SMTP id fo5-20020a0568709a0500b001320ebfdc61mr14579207oab.76.1667400423352; Wed, 02 Nov 2022 07:47:03 -0700 (PDT) MIME-Version: 1.0 References: <20221102132811.70858-1-luwei32@huawei.com> In-Reply-To: <20221102132811.70858-1-luwei32@huawei.com> From: Neal Cardwell Date: Wed, 2 Nov 2022 10:46:36 -0400 Message-ID: Subject: Re: [patch net v3] tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent To: Lu Wei Cc: edumazet@google.com, davem@davemloft.net, yoshfuji@linux-ipv6.org, dsahern@kernel.org, kuba@kernel.org, pabeni@redhat.com, xemul@parallels.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 2, 2022 at 8:23 AM Lu Wei wrote: > > If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code > of TCPOPT_SACK_PERM is called to enable sack after data is sent > and before data is acked, ... This "before data is acked" phrase does not quite seem to match the sequence below, AFAICT? How about something like: If setsockopt TCP_REPAIR_OPTIONS with opt_code TCPOPT_SACK_PERM is called to enable SACK after data is sent and the data sender receives a dupack, ... > ... it will trigger a warning in function > tcp_verify_left_out() as follows: > > ============================================ > WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132 > tcp_timeout_mark_lost+0x154/0x160 > tcp_enter_loss+0x2b/0x290 > tcp_retransmit_timer+0x50b/0x640 > tcp_write_timer_handler+0x1c8/0x340 > tcp_write_timer+0xe5/0x140 > call_timer_fn+0x3a/0x1b0 > __run_timers.part.0+0x1bf/0x2d0 > run_timer_softirq+0x43/0xb0 > __do_softirq+0xfd/0x373 > __irq_exit_rcu+0xf6/0x140 > > The warning is caused in the following steps: > 1. a socket named socketA is created > 2. socketA enters repair mode without build a connection > 3. socketA calls connect() and its state is changed to TCP_ESTABLISHED > directly > 4. socketA leaves repair mode > 5. socketA calls sendmsg() to send data, packets_out and sack_outs(dup > ack receives) increase > 6. socketA enters repair mode again > 7. socketA calls setsockopt with TCPOPT_SACK_PERM to enable sack > 8. retransmit timer expires, it calls tcp_timeout_mark_lost(), lost_out > increases > 9. sack_outs + lost_out > packets_out triggers since lost_out and > sack_outs increase repeatly > > In function tcp_timeout_mark_lost(), tp->sacked_out will be cleared if > Step7 not happen and the warning will not be triggered. As suggested by > Denis and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was > already sent. So this patch checks tp->segs_out, only TCP_REPAIR_OPTIONS > can be set only if tp->segs_out is 0. > > socket-tcp tests in CRIU has been tested as follows: > $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp* --keep-going \ > --ignore-taint > > socket-tcp* represent all socket-tcp tests in test/zdtm/static/. > > Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters") > Signed-off-by: Lu Wei > --- > net/ipv4/tcp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index ef14efa1fb70..1f5cc32cf0cc 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, > case TCP_REPAIR_OPTIONS: > if (!tp->repair) > err = -EINVAL; > - else if (sk->sk_state == TCP_ESTABLISHED) > + else if (sk->sk_state == TCP_ESTABLISHED && !tp->segs_out) The tp->segs_out field is only 32 bits wide. By my math, at 200 Gbit/sec with 1500 byte MTU it can wrap roughly every 260 secs. So a caller could get unlucky or carefully sequence its call to TCP_REPAIR_OPTIONS (based on packets sent so far) to mess up the accounting and trigger the kernel warning. How about using some other method to determine if this is safe? Perhaps using tp->bytes_sent, which is a 64-bit field, which by my math would take 23 years to wrap at 200 Gbit/sec? If we're more paranoid about wrapping we could also check tp->packets_out, and refuse to allow TCP_REPAIR_OPTIONS if either tp->bytes_sent or tp->packets_out are non-zero. (Or if we're even more paranoid I suppose we could have a special new bit to track whether we've ever sent something, but that probably seems like overkill?) neal