Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp3391319rwe; Mon, 29 Aug 2022 10:44:00 -0700 (PDT) X-Google-Smtp-Source: AA6agR7cLbPsla5Dkw2MbVrdYF3Gr/HIN+WyVlO0OYAkGFg8VqGQ8SxfSA/OZVSlxVpa5bqgy9Ic X-Received: by 2002:a17:906:58c9:b0:730:bc01:fd5f with SMTP id e9-20020a17090658c900b00730bc01fd5fmr13970853ejs.504.1661795040290; Mon, 29 Aug 2022 10:44:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661795040; cv=none; d=google.com; s=arc-20160816; b=qxXs1IAOqbHhenoncLuyK7nBPOUowr/wmGX/zVR5dlGSI8rv3YestqcxC9WmMzVJ3O JcMi7FQlhWLGe7FXv+aixKBzdf4ps+5dhrgXG0QtYDBtd96/ddRfYNFXWdrJto6PxNti ZCcJw4JQqaFR5hMBMMfczlWzpAuSPNerxG2lowxAJwgL4knCI64965MdP6xQScH3gEJZ l5l0i5Io9HZTfG74XYL8T86NIAEx3B2688tRwu892T65gCbFxsKPt7wqiPr9F20VBYv+ QmBzoyR/mP1fN72DF7bm0kGLcgr4mmDzWMqT5S9bMZwNqtjzETX6anosLPM9zefbLjwx KrYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=mg3yPVsPGOhfl9KMQx8tsgwQDpIzz9xWYJXonPMJUFs=; b=XvhfIdT1unXLh5LfZszeM/3tq7a42QQ4kOOnpX9m+QHLV5PR2oqATXWoe8bdgWjqdo ZVq31wAGyRl0K/9XYf/GRmvuaSf/sZkxLxAC3XMJbE8E9SQDJPzq4MyaATnlKwrDouju kuaK2mIMLrAGO0MqjUQX69KsGcUxvLmFD2gN+OK60/OUpG8f+3urOu+V8WSUSsTtuYa6 nbiMbmrNKeMUymIO2qo66C2MNHyxSKfmGuRoInhVcwuW55nRCU/NFb59j8YMCIlEu3rm wQTXuVw+4nCl8FTJFtChK0zj5S+Xh2cTROEo7DYYVRFz6h1dT2UI+oqowLiU5ACWJbGt MHwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=o6TmM7CH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u8-20020a170906780800b0073083c63edcsi6221301ejm.306.2022.08.29.10.43.32; Mon, 29 Aug 2022 10:44:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=o6TmM7CH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229750AbiH2QrG (ORCPT + 99 others); Mon, 29 Aug 2022 12:47:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229602AbiH2QrE (ORCPT ); Mon, 29 Aug 2022 12:47:04 -0400 Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3923A7FE7A; Mon, 29 Aug 2022 09:47:04 -0700 (PDT) Received: by mail-qt1-x82b.google.com with SMTP id e28so6589615qts.1; Mon, 29 Aug 2022 09:47:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=mg3yPVsPGOhfl9KMQx8tsgwQDpIzz9xWYJXonPMJUFs=; b=o6TmM7CHCBSDjCcnKBG4eTEjzl9WVs4/ZzwhpaLRdsaO/STEKItW+KoP+rvl6S3Fds s6T4QM5w33Ds50owu6eyxbC0sdAYvBj1Tw3T+5OWJ6arRJ+wXD8pdMhg3NoEhOKBB7za pllnYaCJSwwHUjIJAYrTy4TxPeRh7ckYiKp+kNjrAMuLizjHzt2W0mc1W0WTMjz8aFfJ SkZ4lpZScSiamdQx0wYszmChvMEfdbr33ZpmqTzJ3EgbBBUt+HX7wHqt8+iQC0Rv8xJn g2iIn/8buX2AXWxEdB3Ep8bbwj/Or8I6puvWOuKcQMXJGsxZCToPjO33KvSOBqx14Z1a cnAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=mg3yPVsPGOhfl9KMQx8tsgwQDpIzz9xWYJXonPMJUFs=; b=WlcH7IREeBR90DFEMqLS81Fn3kY+FpEJYXJvq0IAVN8Tyb/QQHZymRxv9pdXj9avDw p/K+iSjaXWh5muk7CI4badqQgZA38BpnnqtklP/GNPlFnjvXAmTb8zPLYLIEFzZYkA/l XDozkCuejiBtC9GTLAzThHJ6+R6YMe8uWjB3FTmpfLdSn1liO5pkkgCYW2aRwgWJUZst 1YtNk+nEElmmWPDp2U8AF8fQzSNUVc+sDyW0m5ELVqGTIn1xgVBTEeArIwpsy+UvgxBU 0KXQcttLXfe4vyKakfcEn9m75w6dlO69QGoeBmwXcpYAn3sZsSmHd5YTauyVe01KYUNs /ICQ== X-Gm-Message-State: ACgBeo02Jng6CPo3wcNFkU5815dy8G5P1bQg/XzoZtlrJeCIWzVlCQAZ KwAQa7qbRdeROPh0GSBONK1tq+fDuEw= X-Received: by 2002:ac8:5a01:0:b0:344:6aef:9a8d with SMTP id n1-20020ac85a01000000b003446aef9a8dmr10903344qta.131.1661791623337; Mon, 29 Aug 2022 09:47:03 -0700 (PDT) Received: from localhost ([2600:1700:65a0:ab60:8fb6:8017:fac7:922b]) by smtp.gmail.com with ESMTPSA id bj15-20020a05620a190f00b006bbc3724affsm5881045qkb.45.2022.08.29.09.47.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Aug 2022 09:47:02 -0700 (PDT) Date: Mon, 29 Aug 2022 09:47:00 -0700 From: Cong Wang To: Eric Dumazet Cc: Peilin Ye , "David S. Miller" , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Hideaki YOSHIFUJI , David Ahern , Jamal Hadi Salim , Jiri Pirko , Peilin Ye , netdev , "open list:DOCUMENTATION" , LKML , Cong Wang , Stephen Hemminger , Dave Taht Subject: Re: [PATCH RFC v2 net-next 0/5] net: Qdisc backpressure infrastructure Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 22, 2022 at 09:22:39AM -0700, Eric Dumazet wrote: > On Mon, Aug 22, 2022 at 2:10 AM Peilin Ye wrote: > > > > From: Peilin Ye > > > > Hi all, > > > > Currently sockets (especially UDP ones) can drop a lot of packets at TC > > egress when rate limited by shaper Qdiscs like HTB. This patchset series > > tries to solve this by introducing a Qdisc backpressure mechanism. > > > > RFC v1 [1] used a throttle & unthrottle approach, which introduced several > > issues, including a thundering herd problem and a socket reference count > > issue [2]. This RFC v2 uses a different approach to avoid those issues: > > > > 1. When a shaper Qdisc drops a packet that belongs to a local socket due > > to TC egress congestion, we make part of the socket's sndbuf > > temporarily unavailable, so it sends slower. > > > > 2. Later, when TC egress becomes idle again, we gradually recover the > > socket's sndbuf back to normal. Patch 2 implements this step using a > > timer for UDP sockets. > > > > The thundering herd problem is avoided, since we no longer wake up all > > throttled sockets at the same time in qdisc_watchdog(). The socket > > reference count issue is also avoided, since we no longer maintain socket > > list on Qdisc. > > > > Performance is better than RFC v1. There is one concern about fairness > > between flows for TBF Qdisc, which could be solved by using a SFQ inner > > Qdisc. > > > > Please see the individual patches for details and numbers. Any comments, > > suggestions would be much appreciated. Thanks! > > > > [1] https://lore.kernel.org/netdev/cover.1651800598.git.peilin.ye@bytedance.com/ > > [2] https://lore.kernel.org/netdev/20220506133111.1d4bebf3@hermes.local/ > > > > Peilin Ye (5): > > net: Introduce Qdisc backpressure infrastructure > > net/udp: Implement Qdisc backpressure algorithm > > net/sched: sch_tbf: Use Qdisc backpressure infrastructure > > net/sched: sch_htb: Use Qdisc backpressure infrastructure > > net/sched: sch_cbq: Use Qdisc backpressure infrastructure > > > > I think the whole idea is wrong. > Be more specific? > Packet schedulers can be remote (offloaded, or on another box) This is not the case we are dealing with (yet). > > The idea of going back to socket level from a packet scheduler should > really be a last resort. I think it should be the first resort, as we should backpressure to the source, rather than anything in the middle. > > Issue of having UDP sockets being able to flood a network is tough, I > am not sure the core networking stack > should pretend it can solve the issue. It seems you misunderstand it here, we are not dealing with UDP on the network, just on an end host. The backpressure we are dealing with is from Qdisc to socket on _TX side_ and on one single host. > > Note that FQ based packet schedulers can also help already. It only helps TCP pacing. Thanks.