Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp3752857rwe; Mon, 29 Aug 2022 19:52:17 -0700 (PDT) X-Google-Smtp-Source: AA6agR47YXJmFUvoIi4HyjHjtxFFjo9CNB0Ag1SvwjidR/ls0xnEHYSAFqBBVLo8d4TofHOJIGHA X-Received: by 2002:a05:6402:1c95:b0:447:ee9d:ef3d with SMTP id cy21-20020a0564021c9500b00447ee9def3dmr13450080edb.88.1661827937038; Mon, 29 Aug 2022 19:52:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661827937; cv=none; d=google.com; s=arc-20160816; b=MWSXhGtpiUkJxumMtP7aHuga+Or8RjS/K+f2wzWbIKD5x+3yYsf0wpz3qXrf6jpwiH FM9YQetr449NvfIEG09K1xQlCWfvRLi7fCHP/EJseZLOgt8VHVBvjs8A+bzCxV8slLrj 996HdfecCPRpP08JoDn8xI8x6Q1c7irueR1Ig8wuf03F30T5K8hjY2U+lAOiUXCKE6ur CPcXBYT6dB6ZIDg3OsH86s4OXy2dGMRnBDAif/ZqFikTBy0oyvZLeB9QgUf5+rJEdXTP LbYFYwwbLYnJBxK2wOrbsGb5wqpDoVffK2LpB5sSIQxcacK2lHz3cH/PHykOHcPjgPFR JVVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=QjrqdkL8EiHyS4Guqm609iRF1yaRNRzIlk5DKbamgo8=; b=JZIExh5Un4v7v6SLf1pAuJ2ONqCTVIQV1MO40PD1xvWeFi4QXCRaOhJCN6FTluYYMN YaF19r5OKkuJJvLAJQfrQ/U89xQgtzPWA6WJZDt7c3b4pro4Mp8XdqnYGB5cFgb5PQQD UdHJ5gUjs3LToIqpoXJX9SqXBkA2We+v1RGDjG2p0WQFMYt+AvIVHj/2A/1ozdhMWVV6 FF9kzNpKuXTLBgbGIAYOpmsWEwBZMu/akIGFRDhNH+krgadBrmhuOPCCwPl1N7a4bD+f XZW2RbIsTyF68pyq0VmXmmB3jC339dIYPtA6WjAeNplZKxZzo5RPJdygIX5ilYLbHdQx thMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=mwa2PjSt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t2-20020a056402524200b0044880841e1asi2866163edd.347.2022.08.29.19.51.51; Mon, 29 Aug 2022 19:52:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=mwa2PjSt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229481AbiH3C2z (ORCPT + 99 others); Mon, 29 Aug 2022 22:28:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230004AbiH3C2n (ORCPT ); Mon, 29 Aug 2022 22:28:43 -0400 Received: from mail-vs1-xe30.google.com (mail-vs1-xe30.google.com [IPv6:2607:f8b0:4864:20::e30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBE689E8AB; Mon, 29 Aug 2022 19:28:38 -0700 (PDT) Received: by mail-vs1-xe30.google.com with SMTP id k66so4555935vsc.11; Mon, 29 Aug 2022 19:28:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=QjrqdkL8EiHyS4Guqm609iRF1yaRNRzIlk5DKbamgo8=; b=mwa2PjStAL2PoYMIfVe51D5frUlZHWSS//ENBMptJb92tASbpJcZKvwQDvO6XWBf/0 +5MeLtVVslLDgMPDnDcrVtSzDhio9vc+uX03u6cbzy6SVDfgnfGLEryQU9WRIwGf6/+O WH4zTPszcB7diuZCINwHdTmoP5PBgeXoqfc+CdDjyUoVNN8Rmn42M5+1JX0aUlkJ+NyX GdCmaH77/mhRBZKNObV0l1wVVh5tQpi9pELkAjZB/sInHcGEdAjyVgc63J5tK/Rzy53a vrQn4JyAjfYP3wBaOShM7cNu1zzwZR0/lGlnl9szZIkPRKY2xolJsn1B+TlGyxav30/m rLTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=QjrqdkL8EiHyS4Guqm609iRF1yaRNRzIlk5DKbamgo8=; b=h24R5uxJ4gGJgkJOHmGdwMn0noVHxYvgFgq8o3YaS/OqZ6qxwQEQPpilrAvz2L8DkI Xz9p+uMo/y1NMJNlHgp1qFf0bDN0b8G6oCxMyFSPr0Yr+l1Fwk/dfsoTzPy+n3pWPajT jgFWML+Gl1+zyBfTLPBxl0n020d9Jz8wJwGNU7GMBQbfccLqxqp2NuL7DBw0Qy6SUOoI Jt/ksHXFesH3BE2uT41JWtUkR5vXO1PtBoU6t2XEIvzs0xF6d+b1klXFiqfQfHKjoATg Mu/sjpUx1MHcgUpPOpBux3g82gWDbcXgw4g2WgQNUNmNHRvi3Pw2xOzzGEBgmd/YuG/Y tmwA== X-Gm-Message-State: ACgBeo3L21S969tqS0aNxm6hsIpu96HzSP4x5rt3FMsXY/0GMoq+zVv0 GaaOBZvFiopPdHxAsPu3jmZTF06NQjTCrJ8Kpf4= X-Received: by 2002:a67:b445:0:b0:391:92ef:464c with SMTP id c5-20020a67b445000000b0039192ef464cmr94682vsm.22.1661826517920; Mon, 29 Aug 2022 19:28:37 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yafang Shao Date: Tue, 30 Aug 2022 10:28:01 +0800 Message-ID: Subject: Re: [PATCH RFC v2 net-next 0/5] net: Qdisc backpressure infrastructure To: Eric Dumazet Cc: Peilin Ye , "David S. Miller" , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Hideaki YOSHIFUJI , David Ahern , Jamal Hadi Salim , Cong Wang , Jiri Pirko , Peilin Ye , netdev , "open list:DOCUMENTATION" , LKML , Cong Wang , Stephen Hemminger , Dave Taht Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 23, 2022 at 1:02 AM Eric Dumazet wrote: > > On Mon, Aug 22, 2022 at 2:10 AM Peilin Ye wrote: > > > > From: Peilin Ye > > > > Hi all, > > > > Currently sockets (especially UDP ones) can drop a lot of packets at TC > > egress when rate limited by shaper Qdiscs like HTB. This patchset series > > tries to solve this by introducing a Qdisc backpressure mechanism. > > > > RFC v1 [1] used a throttle & unthrottle approach, which introduced several > > issues, including a thundering herd problem and a socket reference count > > issue [2]. This RFC v2 uses a different approach to avoid those issues: > > > > 1. When a shaper Qdisc drops a packet that belongs to a local socket due > > to TC egress congestion, we make part of the socket's sndbuf > > temporarily unavailable, so it sends slower. > > > > 2. Later, when TC egress becomes idle again, we gradually recover the > > socket's sndbuf back to normal. Patch 2 implements this step using a > > timer for UDP sockets. > > > > The thundering herd problem is avoided, since we no longer wake up all > > throttled sockets at the same time in qdisc_watchdog(). The socket > > reference count issue is also avoided, since we no longer maintain socket > > list on Qdisc. > > > > Performance is better than RFC v1. There is one concern about fairness > > between flows for TBF Qdisc, which could be solved by using a SFQ inner > > Qdisc. > > > > Please see the individual patches for details and numbers. Any comments, > > suggestions would be much appreciated. Thanks! > > > > [1] https://lore.kernel.org/netdev/cover.1651800598.git.peilin.ye@bytedance.com/ > > [2] https://lore.kernel.org/netdev/20220506133111.1d4bebf3@hermes.local/ > > > > Peilin Ye (5): > > net: Introduce Qdisc backpressure infrastructure > > net/udp: Implement Qdisc backpressure algorithm > > net/sched: sch_tbf: Use Qdisc backpressure infrastructure > > net/sched: sch_htb: Use Qdisc backpressure infrastructure > > net/sched: sch_cbq: Use Qdisc backpressure infrastructure > > > > I think the whole idea is wrong. > > Packet schedulers can be remote (offloaded, or on another box) > > The idea of going back to socket level from a packet scheduler should > really be a last resort. > > Issue of having UDP sockets being able to flood a network is tough, I > am not sure the core networking stack > should pretend it can solve the issue. > > Note that FQ based packet schedulers can also help already. We encounter a similar issue when using (fq + edt-bpf) to limit UDP packet, because of the qdisc buffer limit. If the qdisc buffer limit is too small, the UDP packet will be dropped in the qdisc layer. But the sender doesn't know that the packets has been dropped, so it will continue to send packets, and thus more and more packets will be dropped there. IOW, the qdisc will be a bottleneck before the bandwidth limit is reached. We workaround this issue by enlarging the buffer limit and flow_limit (the proper values can be calculated from net.ipv4.udp_mem and net.core.wmem_default). But obviously this is not a perfect solution, because net.ipv4.udp_mem or net.core.wmem_default may be changed dynamically. We also think about a solution to build a connection between udp memory and qdisc limit, but not sure if it is a good idea neither. -- Regards Yafang